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Sir: 



BRIEF ON APPEAL 



Further to the Notice of Appeal filed November 5, 2003, and received by the USPTO on 
November 7, 2003, herewith are three copies of Appellants' Brief on Appeal. Authorized fees 
include the $ 330.00 fee for the filing of this Brief 



This is an appeal from the decision of the Examiner finaUy rejecting claims 1-6 of the 
above-identified application. 

01/09/2004 JftDDOl 00000092 090108 09895686 

01 FC!l«2 330.00 DA (\) REAL PARTY IN INTEREST 

The above- identified application is assigned of record to Incyte Pharmaceuticals, Inc., 
(now Incyte Corporation, formerly known as Incyte Genomics, Inc.) (Reel 012272, Frame 0191) 
which is the real party in interest herein. 
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(2^ RELATED APPEALS AND INTERFERENCES 
Appellants, their legal representative and the assignee are not aware of any related 
appeals or interferences which will directly affect or be directly affected by or have a bearing on 
the Board's decision in the instant appeal. 



Claims rejected: 
Claims allowed: 
Claims canceled: 
Claims withdrawn: 
Claims on Appeal: 



(3) STATUS OF THE CLAIMS 
Claims 1-6. 
(none). 
Claims 13-20. 
Claims 7-12. 

Claims 1-6 (A copy of the clainis on appeal, as amended, can be 
found in the attached Appendix.) 



(4) STATUS OF AMENDMENTS AFTER HNAL 
There were no amendments submitted after Final Rejection. 



(5) SUMMARY OF THE INVENTION 
Appellants' invention is directed to polynucleotides encoding a human G-protein coupled 
receptor (GPCR), SEQ ID NO:l, in particular, a metabotropic glutanaate GPCR, based on the 
conservation of various sequence motifs characteristic of this family of proteins, in particular, the 
seven hydrophobic transmembrane domains characteristic of GPCRs. See specification, at page 
1 1 and Table 1 and Figure 1. The glutamate GPCRs are described in the specification and the art 
of record as important in neurotransmission and involved in neurological disorders such as 
epilepsy, stroke, and neurodegeneration. See specification, at page 2. Polynucleotides encoding 
SEQ ID NO: 1 are also disclosed as differentially expressed in thyroid tumors, in particular, 
follicular carcinoma based on Northern analysis in thyroid tissues. See specification, at page 35. 
The claimed polynucleotides are asserted to be useful in the diagnosis, treatment, and evaluation 
of therapies for neurological and neoplastic disorders, in particular, follicular carcinoma. 
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(6) ISSUES 

1. Whether claims 1-6 directed to SEQ ED N0:1 encoding polynucleotides meet the 
utility requirement of 35 U.S.C. § 101. In particular, whether the conservation of sequence motifs 
and domains between the protein coded for by the claimed polynucleotide and metabotropic 
GPCRs, known to have utility iu neurotransmission and neurological disorders, demonstrates a 
"substantial likelihood" of utility under 35 U.S.C. § 101. Whether there is evidence that the 
differential expression of the polynucleotide encoding SEQ ID NO: 1 in thyroid tumors provides 

a substantial likelihood of utility for the claimed polynucleotides ia the detection and diagnosis 
of thyroid tumors. 

2. Whether one of ordiuary skill ia the art would know how to use the claimed 
polynucleotides, e.g., ia toxicology testiag, drug development, and the diagnosis of disease, so as 
to satisfy the enablement requirement of 35 U.S.C. § 1 12, first paragraph. 

3. Whether fragments and variants of the polynucleotides encoding SEQ ID NO: 1 
are sufficiently described in the specification that the skilled artisan would recognize applicant's 
possession of them at the time the application was filed in accordance with 35 U.S.C. § 112, First 
Paragraph. 

4. Whether the claimed polynucleotides are sufficiently described in priority 
application Serial No. 09/516,513, filed September 17, 1998 to meet the requirements of 35 
U.S.C. § 1 12, first paragraph and claim an effective priority date of September 17, 1998 with 
respect to the now claimed invention. 

(7) GROUPING OF THE CLAIMS 

As to Issue 1 

All of the claims on appeal stand or fall together. 
As to Issue 2 

AU of the claims on appeal stand or fall together 
As to Issue 3 
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All of the claims on appeal stand or fall together 
As to Issue 4 

Claims 1 and 3-6 stand or fall together. 

(8) APPELLANTS' ARGUMENTS 

The rejection of claims 1-6 under 35 U.S.C. §§ 101 and 112, first paragraph is improper, as 
the inventions of those claims have a patentable utility as set forth in the instant specification, 
and/or a utility well knovyn to one of ordinarv skill in the art. 

Claims 1-6 stand rejected under 35 U.S.C. §§ 101 and 1 12, first paragraph, based on the 
allegation that the claimed invention lacks patentable utility. The rejection alleges in particular 
that: 

the claimed invention is not supported by either a substantial and specific asserted utility 
or a weU- established utihty. None of the described uses are considered to be specific or. 
substantial utilities for either the protein or encoding nucleic acid molecules. Methods 
such as identification of ligands, use to screen for homologous genes, use to identify 
chromosomes or chromosomal locations, use to recombinantly produce protein or to 
generate antibodies are considered general methods applicable to any protein and/or 
nucleic acid. 

• Applicants assertion that the claimed polynucleotide can be used in cancer diagnosis, in 
particular follicular carcinoma of the thyroid, is unconvincing because the correlation 
between the expression of the polynucleotide and follicular carcinoma is based on one 
single Ubrary. The determination of a cancer marker must be based on studying results 
from considerable nimiber of patients, and statistical analysis. See Guidelines for Marker 
Development by the National Cancer Institute (NCI). 

The invention at issue is a polynucleotide corresponding to a gene that is expressed in 
humans. The novel polynucleotide codes for a polypeptide demonstrated in the patent 
specification to be a member of the class of glutmate GPCRs, whose biological functions include 
control of neuerotransmission. The claimed invention has numerous practical, beneficial uses in 
toxicology testiug, drug development, and the diagnosis of disease, none of which requires 
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knowledge of how the polypeptide coded for by the polynucleotide actually functions. The 
claimed invention can be used, for example, as a marker for cancers of the thyroid, in particular, 
follicular carcinoma. See specification, at page 35. As a result of the benefits of these uses, the 
claimed invention already enjoys significant commercial success. 

Applicants have previously submitted a Declaration by Dr. John C. Rockett showing the 
many reasons why the use of the claimed polynucleotides in gene expression profiling studies in 
toxicology testing would be readily apparent to the skilled artisan at the time the application was 
filed. 

Applicants further submit two additional expert Declarations by Dr. Vishwanath R. Iyer 

and Dr. Tod Bedilion under 37 C.F.R. § 1.132, with respective attachments, and ten (10) 

scientific references filed before the September 17, 1998 priority date of the instant application. 

The Rockett Declaration, Iyer Declaration, Bedilion Declaration, and the ten (10) references fuUy 

establish that, prior to the September 17, 1998 filing date of the parent Bandman '513 

application, it was well-established in the art that: 

polynucleotides derived from nucleic acids expressed in one or 
more tissues and/or cell types can be used as hybridization probes ~ that is, as 
tools - to survey for and to measure the presence, the absence, and the amount 
of expression of their cognate gene; 

with sufficient length, at sufficient hybridization stringency, and 
with sufficient wash stringency — conditions that can be routinely established — 
expressed polynucleotides, used as probes, generate a signal that is specific to 
the cognate gene, that is, produce a gene-specific expression signal; 

expression analysis is useful, inter alia, in drug discovery and 
lead optimization efforts, in toxicology, particularly toxicology studies 
conducted early lq drug development efforts, and in phenotypic 
characterization and categorization of cell types, includiag neoplastic cell 
types; 

each additional gene-specific probe used as a tool in expression 
analysis provides an additional gene-specific signal that could not otherwise 
have been detected, giving a more comprehensive, robust, higher resolution, 
statistically more significant, and thus more useful expression pattern in such 
analyses than would otherwise have been possible; 

biologists, such as toxicologists, recognize the iucreased utiUty 
of more comprehensive, robust, higher resolution, statistically more significant 
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results, and thus want each newly identified expressed gene to be included in 
such an analysis; 

nucleic acid microarrays increase the parallelism of expression 
measurements, providing expression data analogous to that provided by older, 
lower throughput techniques, but at substantially increased throughput; 

accordingly, when expression profiling is performed using 
microarrays, each additional gene-specific probe that is included as a signaling 
component on this analytical device increases the detection range, and thus 
versatility, of this research tool; 

biologists, such as toxicologists, recognize the increased utihty 
of such improved tools, and thus want a gene-specific probe to each newly 
identified expressed gene to be included in such an analytical device; 

the industrial suppliers of microarrays recognize the increased 
utihty of such improved tools to their customers, and thus strive to improve 
salability of their microarrays by adding each newly identified expressed gene 
to the microarrays they sell; 

it is not necessary that the biological function of a gene be 
known for measxirement of its expression to be useful in drug discovery and 
lead optimization analyses, toxicology, or molecular phenotyping experiments; 

failure of a probe to detect changes in expression of its cognate 
gene does not diminish the usefulness of the probe as a research tool; and 

failure of a probe completely to detect its cognate transcript in 
any single expression analysis experiment does not deprive the probe of 
usefulness to the community of users who would use it as a research tool. ] 

The Patent Examiner does not dispute that the claimed polynucleotide can be used as a 
probe in cDNA microarrays and used in gene expression monitoring applications. Instead, the 
Patent Examiner contends that the claimed polynucleotide cannot be useful without precise 
knowledge of its biological function, or the biological function of the polypeptide it encodes. 
But the law has never required knowledge of biological function to prove utility. It is the 
claimed iuvention's uses, not its functions, that are the subject of a proper analysis under the 
utihty requirement. 

In any event, as demonstrated by the Rockett Declaration, the Iyer Declaration, and the 
Bedilion Declaration, the person of ordinary skill in the art can achieve beneficial results from 
the claimed polynucleotide in the absence of any knowledge as to the precise fimction of the 
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protein encoded by it. The uses of the claimed polynucleotide in gene expression monitoring 
applications are in fact independent of its precise biological fxmction. 

L The applicable legal standard 

To meet the utility requirement of sections 101 and 1 12 of the Patent Act, the patent 

applicant need only show that the claimed invention is "practically useful," Anderson v. Natta, 

480 F.2d 1392, 1397, 178 USPQ 458 (CCPA 1973) and confers a "specific benefit" on the 

public. Brenner v. Manson, 383 U.S. 519, 534-35, 148 USPQ 689 (1966). As discussed in a 

recent Court of Appeals for the Federal Circuit case, this threshold is not high: 

An invention is "useful" under section 101 if it is capable of providing some identifiable 
benefit. Set Brenner v. Manson, 383 U.S. 519, 534 [148 USPQ 689] (1966); Brooktree 
Corp. V. Advanced Micro Devices, Inc,,911 F.2d 1555, 1571 [24 USPQ2d 1401] (Fed. 
Cir. 1992) ("to violate Section 101 the claimed device must be totally iucapable of 
achieving a useful result"); Fuller v. Berger, 120 F. 274, 275 (7th Cir. 1903) (test for 
utility is whether invention "is incapable of serving any beneficial end"). 

Juicy Whip Inc, v. Orange Bang Inc. , 51 USPQ2d 1700 (Fed. Cir. 1999). 

While an asserted utility must be described with specificity, the patent applicant need not 

demonstrate utility to a certainty. In Stiftung v. Renishaw PLC, 945 F.2d 1 173, 1 180, 

20 USPQ2d 1094 (Fed. Cir. 1991), the United States Court of Appeals for the Federal Circuit 

explained: 

An invention need not be the best or only way to accomplish a certain result, and it need 
only be useful to some extent and in certaiu applications: "[T]he fact that an invention has 
only limited utility and is only operable in certaiu applications is not groxmds for finding 
lack of utility." Envirotech Corp. v. Al George, Inc., 730 F.2d753, 762, 221 USPQ 473, 
480 (Fed. Cir. 1984). 

The specificity requirement is not, therefore, an onerous one. If the asserted utility is 
described so that a person of ordinary skill in the art would understand how to use the claimed 
invention, it is sufficiently specific. See Standard Oil Co. v. Montedison, S.p.a. , 212 U.S.P.Q. 
327, 343 (3d Cir. 1981). The specificity requirement is met unless the asserted utihty amounts to 
a "nebulous expression" such as "biological activity" or "biological properties" that does not 
convey meaningful information about the utility of what is being claimed. Cross v. lizuka, 
753 F.2d 1040, 1048 (Fed. Cir. 1985). 
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In addition to conferring a specific benefit on the public, the benefit must also be 
"substantial." Brenner, 383 U.S. at 534. A "substantial" utility is a practical, "real-world" utility. 
Nelson v. Bowler, 626 R2d 853, 856, 206 USPQ 881 (CCPA 1980). 

If persons of ordinary skill in the art would understand that there is a "well-established" 
utihty for the claimed invention, the threshold is met automatically and the applicant need not 
make any showing to demonstrate utility. Manual of Patent Examining Procedure at § 706.03(a). 
Only if there is no "well-established" utility for the claimed invention must the appUcant 
demonstrate the practical benefits of the invention. Id. 

Once the patent applicant identifies a specific utility, the claimed invention is presumed 
to possess it. In re Cortright, 165 F.3d 1353, 1357, 49 USPQ2d 1464 (Fed. Cir. 1999); In re 
Brana, 51 F.3d 1560, 1566; 34 USPQ2d 1436 (Fed. Cir. 1995). In that case, the Patent Office 
bears the burden of demonstrating that a person of ordinary skill in the art would reasonably 
doubt that the asserted utihty could be achieved by the claimed invention. Id. To do so, the 
Patent Office must provide evidence or sound scientific reasoning. See In re Lunger, 503 F.2d 
1380, 1391-92, 183 USPQ 288 (CCPA 1974). If and only if the Patent Office makes such a 
showing, the burden shifts to the applicant to provide rebuttal evidence that would conviace the 
person of ordiaary skill that there is sufficient proof of utility. Brana, 51 F.3d at 1566. The 
applicant need only prove a "substantial likelihood" of utility; certainty is not required. Brenner, 
383 U.S. at 532. 

11. Toxicology Testing and disease diagnosis are sufficient utilities under 35 U.S.C. 
§§ 101 and 112, first paragraph 

The claimed invention meets all of the necessary requirements for establishiag a credible 
utihty under the Patent Law: There are "well-established" uses for the claimed invention known 
to persons of ordinary skill in the art, and there are specific practical and beneficial uses for the 
iQvention disclosed in the patent apphcation's specification. These uses are explaiaed, ui detail, 
in the Rockett Declaration, Iyer Declaration, and Second Bedilion Declaration accompanyiag 
this brief or previously submitted. Objective evidence, not considered by the Patent Office, 
further corroborates the credibility of the asserted utilities. 



117940 



8 



09/895,686 



Docket No.: PC-0044 CIP 



A. The use of the clauned SEQ ID NO:l encoding polynucleotides for toxicology 
testing, drug discovery, and disease diagnosis are practical uses that confer 
"specific benefits" to the public 

The claimed invention has specific, substantial, real-world utility by virtue of its use in 

toxicology testing, drug development and disease diagnosis through gene expression profiling. 

These uses are explained in detail in the accompanying Rockett Declaration, Iyer Declaration, 

and Bedilion Declaration, the substance of which is not rebutted by the Patent Examiner. There 

is no dispute that the claimed invention is in fact a useful tool in cDNA microarrays used to 

perform gene expression analysis. That is sufficient to estabhsh utility for the claimed 

polynucleotide. 

In his Declaration, Dr. Rockett explains the many reasons why a person skilled in the art 
in 1998 would have understood that any expressed polynucleotide is useful for a number of gene 
expression monitoring applications, e.g., in cDNA microarrays, in connection with the 
development of drugs and the monitoring of the activity of such dmgs. (Rockett Declaration at, 
e.g., ^^10-18). 

It is my opinion, therefore, based on the state of the art in toxicology at least since 
the mid-1990s . . . that disclosure of the sequence of a new gene or protein, with or 
without knowledge of its biological function, would have been sufficient information 
for a toxicologist to use the gene and/or protein in expression profiling studies in 
toxicology.^ [RockettDeclaration, ^ 18.] 

In his Declaration, Dr. Bedilion explains why a person of skill in the art in 1998 would 
have understood that any expressed polynucleotide is useful for gene expression monitoring 
applications using cDNA microarrays. (Bedilion Declaration, e.g., H 4-7.) In his Declaration, 
Dr. Iyer explains why a person of skill in the art in 1998 would have understood that any 
expressed polynucleotide is useful for gene expression monitoring applications using cDNA 
microarrays, stating that "[t]o provide maximum versatility as a research tool, the microarray 
should include D and as a biologist I would want my microarray to include D each newly 
identified gene as a probe." (Iyer Declaration, ^9.) 

"Use of the words 'it is my opinion' to preface what someone of ordinary skill in the art 
would have known does not transform the factual statements contained in the declaration into 
opinion testimony." In re Alton, 37 USPQ2d 1578, 1583 (Fed. Cir. 1996). 
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In addition, Dr. Rockett explains in his Declaration that "there are a number of other 
differential expression analysis technologies that precede the development of microarrays, some 
by decades, and that have been applied to drug metabolism and toxicology research, including: 
(1) differential screening; (2) subtractive hybridization, including variants such as chemical 
cross-linking subtraction, suppression-PCR subtractive hybridization and representational 
difference analysis; (3) differential display; (4) restriction endonuclease facilitated analyses, 
including serial analysis of gene expression (SAGE) and gene expression fingerprinting and (5) 
EST analysis." (Rockett Declaration, f 7.) 

Nowhere does the Patent Examiner address the fact that, as described on pages 31-32 of 
the Bandman '513 application, the claimed polynucleotides can be used as highly specific probes 
in, for example, cDNA microarrays - probes that without question can be used to measure both 
the existence and amount of complementary RNA sequences known to be the expression 
products of the claimed polynucleotides. The claimed invention is not, in that regard, some 
random sequence whose value as a probe is speculative or would require further research to 
determine. 

Given the fact that the claimed polynucleotide is known to be expressed, its utility as a 
measuring and analyzing instrument for expression levels is as indisputable as a scale's utility for 
measuring weight. This use as a measuring tool, regardless of how the expression level data 

J. 

ultimately would be used by a person of ordinary skill in the art, by itself demonstrates that the 
claimed invention provides an identifiable, real- world benefit that meets the utility requirement. 
Raytheon v. Roper, 724 F.2d 951, (Fed. Cir. 1983) (claimed invention need only meet one of its 
stated objectives to be useful); In re Cortwright, 165 F.3d 1353, 1359 (Fed. Cir. 1999) (how the 
invention works is irrelevant to utility); MPEP § 2107 ("Many research tools such as gas 
chromatographs, screening assays, and nucleotide sequencing techniques have a clear, specific, 
and imquestionable utility (e.g., they are useful in analyzing compounds) " (emphasis added)). 

Literature reviews published shortly before the filing of the Bandman '513 application 
describing the state of the art further confirm the claimed invention's utility. Rockett et al. 
confirm, for example, that the claimed invention is useful for differential expression analysis 
regardless of how expression is regulated: 
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Despite the development of multiple technological advances which have recently 
brought the field of gene expression profiling to the forefront of molecular 
analysis, recognition of the importance of differential gene expression and 
characterization of differentially expressed genes has existed for many years. 

* * * 

Although differential expression technologies are applicable to a broad range of 
models, perhaps their most important advantage is that, in most cases, absolutely 
no prior knowledge of the specific genes which are up- or down-regulated is 
required. 

* * * 

Whereas it would be informative to know the identity and functionality of all 
genes up/down regulated by . . . toxicants, this would appear a longer term goal 
.... However, the current use of gene profiling yields a pattern of gene changes 
for a xenobiotic of unknown toxicity which may be matched to that of well 
characterized toxins, thus alerting the toxicologist to possible in vivo similarities 
between the unknown and the standard, thereby providing a platform for more 
extensive toxicological examination, (emphasis in original) 

Rockett et al., Differential gene expression in drug metabolism and toxicology: practicalities. 

problems and potential , Xenobiotica 29:655-691 (July 1999) (Rockett Declaration, Exhibit C). 

In another pre-September 1998 article, Lashkari et al. state explicitly that sequences that 

are merely "predicted" to be expressed (predicted Open Reading Frames, or ORFs)— the claimed 

invention in fact is-known to be expressed— have numerous uses: 

Efforts have been directed toward the amplification of each predicted ORE or any 
other region of the genome ranging from a few base pairs to several kilobase 
pairs. There are many uses for these amplicons— fhey can be cloned into standard 
vectors or specialized expression vectors, or can be cloned iato other specialized 
vectors such as those used for two-hybrid analysis. The amplicons can also be 
used directlv bv, for example, arraving onto glass for expression analysis , for 
DNA binding assays, or for any direct DNA assay, (emphasis added) 

Lashkari et al., Whole genome analysis: Experimental access to all genome sequenced segments 
through larger-scale efficient oUgonucleotide synthesis and PCR , Proc. Nat. Acad. Sci. 94:8945- 
8947 (Aug. 1997) (Reference No. 1). 
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B. The use of polynucleotides coding for polypeptides expressed by humans as 
tools for toxicology testing, drug discovery, and the diagnosis of disease is 
now "well-established** 

The technologies made possible by expression profiling and the DNA tools upon which 
they rely are now well-established. The technical literature recognizes not only the prevalence of 
these technologies, but also their unprecedented advantages in drug development, testkig and 
safety assessment. These technologies include toxicology testiag, e.g., as described by Bedilion, 
Rockett, and Iyer in their Declarations. 

Toxicology testiag is now standard practice in the pharmaceutical industry. See, e.g. , 

John C. Rockett et al., supra: 

Knowledge of toxin-dependent regulation ia target tissues is not solely an academic 
pursuit as much raterest has been generated in the pharmaceutical iadustry to harness this 
technology in the early identification of toxic drug candidates, thereby shorteniag the 
developmental process and contributiag substantially to the safety assessment of new 
drugs. (Rockett Declaration, Exhibit C, page 656) 

To the same effect are several other scientific publications, iacluding Emile F. Nuwaysir et al., 
Microarravs and toxicologv: The advent of toxicogenomics . Molecular Carciaogenesis 24: 153- 
159 (1999) (Reference No. 2); Sandra Steiner and N. Leigh Anderson, Expression profiling in 
toxicologv — potentials and limitations . Toxicology Letters 112-13:467-471 (2000) (Reference 
No. 3). 

Nucleic acids useful for measuring the expression of whole classes of genes are routuiely 

iacorporated for use ia toxicology testing. Nuwaysir et al. describes, for example, a Human 

ToxChip comprisiQg 2089 human clones, which were selected 

for their well-documented involvement in basic cellular processes as well as their 
responses to different types of toxic iasult. hicluded on this list are DNA replication and 
repair genes, apoptosis genes, and genes responsive to PAHs and dioxin-like compounds, 
peroxisome proliferators, estrogenic compounds, and oxidant stress. Some of the other 
categories of genes iQclude transcription factors, oncogenes, tumor suppressor genes, 
cycluis, kiaases, phosphatases, cell adhesion and motility genes, and homeobox genes. 
Also iacluded in this group are 84 housekeepuig genes, whose hybridization intensity is 
averaged and used for signal normalization of the other genes on the chip. 

See also Table 1 of Nuwaysir et al. (listing additional classes of genes deemed to be of special 

interest ui makiag a human toxicology microarray). 



117940 



12 



09/895,686 



Docket No.: PC-0044 CIP 



The more genes that are available for use in toxicology testing, the more powerful the 
technique. "Arrays are at their most powerful when they contain the entire genome of the species 
they are being used to study. " John C. Rockett and David J. Dix, Application of DNA arrays to 
toxicology . Environ. Health Perspec. 107:681-685 (1999) (Reference No. 4). Control genes are 
carefully selected for their stability across a large set of array experiments in order to best study 
the effect of toxicological compounds. See attached email from the primary investigator on the 
Nuwaysir paper, Dr. Cynthia Afshari, to an hicyte employee, dated July 3, 2000, as well as the 
original message to which she was responding (Reference No. 5), indicating that even the 
expression of carefully selected control genes can be altered. Thus, there is no expressed gene 
which is irrelevant to screening for toxicological effects, and all expressed genes have a utility 
for toxicological screening. 

Further evidence of the well-established utility of all expressed polypeptides and 

polynucleotides in toxicology testing is found in U.S. Pat. No. 5,569,588 (Reference No. 9e) 

and published PCX applications WO 95/21944 (Reference No. 9a), WO 95/20681 

(Reference No. 9b), and WO 97/13877 (Reference No. 9g). 

WO 95/21944 ("Differentially expressed genes in healthy and diseased subjects"), 

published August 17, 1995, describes the use of microarrays in expression profiling analyses, 

emphasizing that patterns of expression can be used to distinguish healthy tissues from diseased 

tissues and that patterns of expression can additionally be used in drug development and 

toxicology studies, without knowledge of the biological fimction of the encoded gene product. 

hi particular, and with emphasis added: 

The present invention involves . . . methods for diagnosing diseases . . . 
characterized by the presence of [differentially expressed] . . . genes, despite the 
absence of knowledge about the gene or its function . The methods involve the use 
of a composition suitable for use in hybridization which consists of a solid surface 
on which is immobilized at pre-defined regions thereon a plurality of defined 
oligonucleotide/ polynucleotide sequences for hybridization. Each sequence 
comprises a fragment of an EST . . . . Differences in hybridization patterns produced 
through use of this composition and the specified methods enable diagnosis of 
diseases based on differential expression of genes of unknown fimction . . . . 
[abstract] 

The method [of the present invention] involves producing and comparing 
hybridization patterns formed between samples of expressed mRNA or cDNA 
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polynucleotide sequences . . . and a defined set of oligonucleotide/polynucleotide[] 
. . . immobilized on a support. Those defined [inunobilized] 
oligonucleotide/polynucleotide sequences are representative of the total expressed 
genetic component of the cells , tissues, organs or organism as defined by the 
collection of partial cDNA sequences (ESTs). [page 2] 

The present invention meets the unfilled needs in the art by providing 
methods for the . . . use of gene fragments and genes, even those of unknown full 
length sequence and unknown fimction, which are differentially expressed in a 
healthy animal and in an animal having a specific disease or infection by use of 
ESTs derived from DNA hbraries of healthy and/or diseased/infected animals, 
[page 4] 

Yet another aspect of the invention is that it provides ... a means for . . . 
monitoring the efficacy of disease treatment regimes including . . . toxicological 
effects thereof ." [page 4] 

It has been appreciated that one or more differentially identified EST or 
gene-specific oligonucleotide/polynucleotides defiae a pattern of differentially 
expressed genes diagnostic of a predisease, disease or infective state. A knowledge 
of the specific biological function of the EST is not required only that the EST[] 
identifies a gene or genes whose altered expression is associated reproducibly with 
the predisease, disease or iafectious state, [page 4] 

As used herein, the term 'disease' or 'disease state' refers to any condition 
which deviates from a normal or standardized healthy state in an organism of the 
same species in terms of differential expression of the organism's genes. . . 
[whether] of genetic or environmental origin, for example, an inherited disorder 
such as certain breast cancers. . . .[or] administration of a drug or exposure of the 
animal to another agent, e.g., nutrition, which affects gene expression, [page 5] 

As used herein, the term 'solid support' refers to any known substrate which 
is useful for the immobilization of large numbers of oligonucleotide/polynucleotide 
sequences by any available method . . . [and includes, inter alia,] nitrocellulose, . . . 
glass, silica. . . . [page 6] 

By 'EST' or 'Expressed Sequence Tag' is meant a partial DNA or cDNA 
sequence of about 150 to 500, more preferably about 300, sequential nucleotides. . . 
. [page 6] 

One or more libraries made from a single tissue type typically provide at 
least about 3000 different (i.e., unique) ESTs and potentially the full complement of 
all possible ESTs representing all cDNAs e.g., 50,000 to 100,000 in an animal such 
as a human , [page 7] 
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The lengths of the defined oUgonucleotide/ polynucleotides may be readily 
increased or decreased as desired or needed. . . . The length is generally guided by 
the principle that it should be of sufficient length to insure that it is onfl average 
only represented once in the population to be examined , [page 7] 

Comparing the , . . hybridization pattems permits detection of those defined 
oligonucleotide/ polynucleotides which are differentially expressed between the 
healthy control and the disease sample by the presence of differences in the 
hybridization pattems at pre-defiaed regions [of the solid support], [page 13] 

It should be appreciated that one does not have to be restricted in using 
ESTs from a particular tissue from which probe RNA or cDNA is obtained[;] rather 
any or all ESTs (known or unknown) may be placed on the support. Hybridization 
will be used [to] form diagnostic pattems or to identify which particular EST is 
detected. For example, all known ESTs from an organism are used to produce a 
'master' solid support to which control sample and disease samples are altemately 
hybridized, [page 14] 

Diagnosis is accomplished by comparing the two hybridization pattems , 
wherein substantial differences between the first and second hybridization pattems 
indicate the presence of the selected disease or infection in the animal being tested. 
Substantially similar first and second hybridization pattems radicate the absence of 
disease or infection. This[,] like many of the foregoing embodunents[,] may use 
known or unknown ESTs derived from many libraries, [page 18] 

Still another intriguing use of this method is in the area of monitoring the 
effects of dmgs on gene expression , both in laboratories and during clinical trials 
with animalfs], especially humans, [page 18] 

WO 95/20681 ("Comparative Gene Transcript Analysis"), filed in 1994 by 
Appellants' assignee and published August 3, 1995, has three issued U.S. counterparts: 
U.S. Pat. Nos. 5,840,484, issued November 24, 1998; 6,1 14,1 14, issued September 5, 2000; and 
6,303,297, issued October 16, 2001. 

The specification describes the use of transcript expression pattems, or "images", 
each comprising multiple pixels of gene-specific information, for diagnosis, for cellular 
phenotyping, and in toxicology and drug development efforts. The specification describes a 
plurality of methods for obtainiug the requisite expression data - one of which is mdcroarray 
hybridization — and equates the uses of the expression data from these disparate platforms. In 
particular, and with emphasis added: 
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The invention provides a "method and system for quantifying the relative 
abundance of gene transcripts in a biological specimen. . . . [G]ene transcript 
imaging can be used to detect or diagnose a particular biological state, disease, or 
condition which is correlated to the relative abundance of gene transcripts in a 
given cell or population of cells. The invention provides a method for comparing 
the gene transcript image analvsis from two or more different biological specimens 
in order to distinguish between the two specimens and identify one or more genes 
which are differentially expressed between the two specimens." [abstract] 

" rW1e see each individual gene product as a 'pixel' of information which 
relates to the expression of that, and onlv that, gene . We teach herein [] methods 
whereby the individual 'pixels' of gene expression information can be combined 
into a single gene transcript 'image, 'in which each of the individual genes can be 
visualized simultaneously and allowing relationships between the gene pixels to be 
easily visualized and understood." [page 2] 

"The present invention avoids the drawbacks of the prior art by providing a 
method to quantify the relative abundance of multiple gene transcripts in a given 
biological specimen . . . . The method of the instant invention provides for detailed 
diagnostic comparisons of cell profiles revealing numerous changes in the 
expression of individual transcripts." [page 6] 

"High resolution analysis of gene expression be used directly as a diagnostic 
profile . . . . " [page?] 

"The method is particularly powerful when more than 100 and preferably 
more than 1,000 gene transcripts are analyzed." [page 7] 

"The invention . . . includes a method of comparing specimens containing 
gene transcripts." [page?] 

"The final data values from the first specimen and the ftirther identified 
sequence values from the second specimen are processed to generate ratios of 
transcript sequences, which indicate the differences in the number of gene 
transcripts between the two specimens." [i.e., the results yield analogous data to 
microarrays] [page 8] 

"Also disclosed is a method of producing a gene transcript image analysis 
by first obtaining a mixture of mRNA, from which cDNA copies are made." [page 
8] 

"hi a further embodiment, the relative abundance of the gene transcripts in 
one cell type or tissue is compared with the relative abundance of gene transcript 
numbers in a second cell type or tissue in order to identify the differences and 
similarities." [page 9] 
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"In essence, the invention is a method and system for quantifying the 
relative abundance of gene transcripts in a biological specimen. The invention 
provides a method for comparing the gene transcript image from two or more 
different biological specimens in order to distinguish between the two specimens. . . 
•"[page 9] 

"[T]wo or more gene transcript images can be compared and used to detect 
or diagnose a particular biological state, disease, or condition which is correlated to 
the relative abundance of gene transcripts in a given cell or population of cells." 
[pages 9-10] 

"The present invention provides a method to compare the relative 
abundance of gene transcripts in different biological specimens. . . . This process is 
denoted herein as gene transcript imaging. The quantitative analysis of the relative 
abimdance for a set of gene transcripts is denoted herein as 'gene transcript image 
analysis' or 'gene transcript frequency analysis'. The present invention allows one 
to obtain a profile for gene transcription in any given population of cells or tissue 
from any type of organism ." [page 11] 

"The invention has significant advantages in the fields of diagnostics, 
toxicology and pharmacology, to name a few." [page 12] 

"[G]ene transcript sequence abuudances are compared against reference 
database sequence abundances including normal data sets for diseased and healthy 
patients. The patient has the disease(s) with which the patient's data set most 
closely correlates ." [page 12] 

"For example, gene transcript frequency analysis can be used to differentiate 
normal cells or tissues from diseased cells or tissues. ..." [page 12] 

" hi toxicology , . . . [g]ene transcript imaging provides highly detailed 
ioformation on the cell and tissue environment, some of which would not be 
obvious in conventional, less detailed screening methods. The gene transcript 
image is a more powerful method to predict drug toxicity and efficacy . Similar 
benefits accrue in the use of this tool in pharmacology. ..." [page 12] 

" hi an altemative embodiment , comparative gene transcript frequency 
analysis is used to differentiate between cancer cells which respond to anti-cancer 
agents and those which do not respond." [page 12] 

"hi a further embodiment, comparative gene transcript frequency analysis is 
used ... for the selection of better pharmacologic animal models." [page 14] 

"hi a further embodiment, comparative gene transcript frequency analysis is 
used in a clinical setting to give a highly detailed gene transcript profile of a 
diseased state or condition." [page 14] 
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" An alternate method of producing a gene transcript image includes the 
steps of obtaining a mixture of test mRNA and providing a representative array of ' 
unique probes whose sequences are complementary to at least some of the test 
mRNAs. Next, a fixed amount of the test noRNA is added to the arrayed probes. 
The test mRNA is incubated with the probes for a sufficient time to allow hybrids 
of the test mRNA and probes to form. The mRNA-probe hybrids are detected and 
the quantity determined ." [page 15] 

" [Tlhis research tool provides a way to get new drugs to the public faster 
and more economically. " [page 36] 

" hi this method, the particular physiologic function of the protein transcript 
need not be determined to qualify the gene transcript as a cliaical marker. " [page 
38] 

"[TJhe gene transcript changes noted iu the earlier rat toxicity study are 
carefully evaluated as clinical markers in the followed patients. Changes in the 
gene transcript image analyses are evaluated as indicators of toxicity by correlation 
with clinical signs and symptoms and other laboratory results. . . . The . . . analysis 
highlights any toxicological changes in the treated patients." [page 39] 

U.S. Pat No. 5,569,588 ("Methods. for Drug Screening") ("the '588 patent"), 
issued October 29, 1996, with a priority date of August 1995, describes an expression profiling 
platform, the "genome reporter matrix", which is different from nucleic acid microarrays. 
Additionally describing use of nucleic acid microarrays, the 588 patent makes clear that the 
utiUty of comparing multidimensional expression datasets is independent of the methods by 
which such profiles are obtained. The 588 patent speaks clearly to the usefulness of such 
expression analyses in drug development and toxicology, particularly pointing out that a gene's 
failure to change in expression level is a useful result. Thus, with emphasis added. 

The invention provides "[mjethods and compositions for modeling the 
transcriptional responsiveness of an organism to a candidate drug. . . . [The final 
step of the method comprises] comparing reporter gene product signals for each cell 
before and after contacting the cell with the candidate drug to obtain a drug 
response profile which provides a model of the transcriptional responsiveness of 
said organism to the candidate drug." [abstract] 

"The present invention exploits the recent advances in genome science to 
provide for the rapid screening of large numbers of compounds against a systemic 
target comprising substantially all targets in a pathway [or] organism ." [col. 1] 
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"The ensemble of reporting cells comprises as comprehensive a collection 
of transcription regulatory genetic elements as is conveniently available for the 
targeted organism so as to most accurately model the systemic transcriptional 
response. Suitable ensembles generallv comprise thousands of individually 
reporting elements: preferred ensembles are substantially comprehensive, i.e. 
provide a transcriptional response diversity comparable to that of the target 
organism. Generallv, a substantially comprehensive ensemble requires transcription 
regulatory genetic elements from at least a maiority of the organism's genes, and 
preferably includes those of all or nearly all of the genes . We term such a 
substantially comprehensive ensemble a genome reporter matrix." [col. 2] 

"Drugs often have side effects that are in part due to the lack of target 
specificity. ... [A] genome reporter matrix reveals the spectrum of other genes in 
the genome also affected by the compound. In considering two different 
compounds both of which induce the ERG 10 reporter, if one compound affects the 
expression of 5 other reporters and a second compound affects the expression of 50 
other reports, the first compound is, a priori, more likely to have fewer side 
effects." [cols. 2 - 3] • 

"Furthermore, it is not necessary to know the identity of any of the 
responding genes ." [col. 3] 

"[A]ny new compound that induces the same response profile as [a] . . . 
dominant tubulin mutant would provide a candidate for a taxol-like 
pharmaceutical." [col. 4] 

"The genome reporter matrix offers a simple solution to recognizing new 
specificities in combinatorial libraries. Specifically, pools of new compounds are 
tested as mixtures across the matrix. If the pool has any new activity not present in 
the original lead compound, new genes are affected among the reporters." [col. 4] 

" A sufficient number of different recombinant cells are included to provide 
an ensemble of transcriptional regulatory elements of said organism sufficient to 
model the transcriptional responsiveness of said organism to a drug. In a preferred 
embodiment, the matrix is substantially comprehensive for the selected regulatory 
elements, e.g. essentially all of the gene promoters of the targeted organism are 
included." [cols. 6-7] 

"In a preferred embodiment, the basal response profiles are determined. . . . 
The resultant electrical output signals are stored in a computer memory as genome 
reporter output signal matrix data structure associating each output signal with the 
coordinates of the corresponding microtiter plate weU and the stimulus or drug. 
This information is indexed against the matrix to form reference response profiles 
that are used to determine the response of each reporter to any miUeu in which a 
stimulus may be provided. After establishing a basal response profile for the 
matrix, each cell is contacted with a candidate drug. The term drug is used loosely 
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to refer to agents which can provoke a specific cellular response. . . . The drug 
induces a complex response pattern of repression, silence and induction across the 
matrix . . . .The response profile reflects the cell's transcriptional adjustments to 
maintain homeostasis in the presence of the drug. . . . After contacting the cells with 
the candidate drug, the reporter gene product signals from each of said cells is again 
measured to determine a stimulated response profile. The basal o[r] background 
response profile is then compared with ... the stimulated response profile to 
identify the cellular response profile to the candidate drug." [cols. 7-8] 

" Li another embodiment of the invention , a matrix [i.e., arravl of 
hybridization probes corresponding to a predetermined population of genes of the 
selected organism is used to specifically detect changes in gene transcription which 
result from exposing the selected organism or cells thereof to a candidate drug. In 
this embodiment, one or more cells derived from the organism is exposed to the 
candidate drug in vivo or ex vivo xmder conditions wherein the drug effects a 
change in gene transcription in the cell to maintain homeostasis. Thereafter, the 
gene transcripts, primarily mRNA, of the cell or cells is isolated . . . [and] then 
contacted with an ordered matrix [array] of hybridization probes, each probe being 
specific for a different one of the transcripts, imder conditions where each of the 
transcripts hybridizes with a corresponding one of the probes to form hybridization 
pairs. The ordered matrix of probes provides, in aggregate, complements for an 
ensemble of genes of the organism sufficient to model the transcriptional 
responsiveness of the organism to a drug. . . . The matrix- wide signal profile of the 
drug-stimulated cells is then compared with a matrix-wide signal profile of negative 
control cells to obtain a specific dmg response profile." [col. 8] 

"The invention also provides means for computer-based quaUtative analysis 
of candidate drugs and unknown compoimds. A wide variety of reference response 
profiles may be generated and used in such analyses." [col. 8] 

" Response profiles for an unknown stimulus (e.g. new chemicals, unknown 
compounds or unknown mixtures) may be analyzed by comparing the new stimulus 
response profiles with response profiles to known chemical stimuh ." [col. 9] 

"The response profile of a new chemical stimulus may also be compared to 
a known genetic response profile for target gene(s)." [col. 9] 

The August 11, 1997 press release from the '588 patent's assignee. Acacia Biosciences 
(now part of Merck) (reference "9h" attached hereto), and the September 15, 1997 news report by 
Glaser, "Strategies for Target Validation Streamline Evaluation of Leads," Genetic Engineering 
News (reference "9i" attached hereto)', attest the commercial value of the methods and technology 
described and claimed in the '588 patent. 
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WO 97/13877 ("Measurement of Gene Expression Profiles in Toxicity 
Determinations"), published April 17, 1997, describes an expression profiling technology 
differing somewhat from the use of cDNA microarrays and differing from the genome reporter 
matrix of the '588 patent; but the use of the data is analogous. As per its title, the reference 
describes use of expression profiling in toxicity determinations. In particular, and with emphasis 
added: 

"[T]he invention relates to a method for detecting and monitoring changes 
in gene expression pattems in in vitro and in vivo systems for deterniining the 
toxicity of drug candidates." [Field of the invention] 

"An object of the invention is to provide a new approach to toxicitv 
assessment based on an examination of gene expression pattems, or profiles , in in 
vitro or in vivo test systems." [page 3] 

"Another object of the invention is to provide a rapid and reliable method 
for correlating gene expression with short term and long term toxicity in test 
animals." [page 3] 

"The invention achieves these and other objects by providing a method for 
massively parallel signature sequencing of genes expressed in one or more selected 
tissues of an organism exposed to a test compound. An important feature of the 
invention is the application of novel ... methodologies that permit the formation of 
gene expression profiles for selected tissues .... Such profiles may be compared 
with those from tissues of control organisms at single or multiple time points to 
identifv expression pattems predictive of toxicitv ." [page 3] 

"As used herein, the terms 'gene expression profile,' and 'gene expression 
pattem' which is used equivalently, means a frequency distribution of sequences of 
portions of cDNA molecules sampled from a population of tag-cDNA conjugates. . 
.. Preferably, the total number of sequences determined is at least 1000; more 
preferably, the total number of sequences determined in a gene expression profile is 
at least ten thousand ." [page 71 

"The invention provides a method for determining the toxicity of a 
compound by analyzing changes in the gene expression profiles in selected tissues 

of test organisms exposed to the compoimd Gene expression profiles derived 

from test organisms are compared to gene expression profiles derived from control 
organisms. ..." [page 7] 

Therefore, the potential benefit to the public, in terms of lives saved and reduced health 
care costs, are enormous. Evidence of the benefits of this information include: 
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• In 1999, CV Therapeutics, an Incyte collaborator, was able to use Incyte gene 
expression technology, information about the structure of a known transporter 
gene, and chromosomal mapping location, to identify the key gene associated 
with Tangiers disease. This discovery took place over a matter of only a few 
weeks, due to the power of these new genomics technologies. The discovery 
received an award from the American Heart Association as one of the top 10 
discoveries associated with heart disease research in 1999. 

In an April 9, 2000, article published by the Bloomberg news service, an Incyte 
customer stated that it had reduced the time associated with target discovery and 
validation from 36 months to 18 months, through use of Incyte's genomic 
information database. Other Incyte customers have privately reported similar 
experiences. The implications of this significant saving of time and expense for 
the nimiber of drugs that may be developed and their cost are obvious. 

In a February 10, 2000, article in the Wall Street JoumaU one Incyte customer 
stated that over 50 percent of the drug targets in its current pipeline were derived 
from the Incyte database. Other Incyte customers have privately reported similar 
experiences. By doubling the nimiber of targets available to pharmaceutical 
researchers, Incyte genomic information has demonstrably accelerated the 
develo|?ment of new drugs. 

Because the Patent Examiner failed to address or consider the "well-established" utilities 
for the claimed invention in toxicology testing, drug development, and the diagnosis of disease, 
the Examiner's rejections should be overturned regardless of their merit. 

C. The uncontested fact that the claimed polynucleotide encodes a protein in the 
GPCR family also demonstrates utiUty 

In addition to having substantial, specific and credible utilities in numerous gene 
expression monitoring applications, it is undisputed that the claimed polynucleotide encodes for 
a protein having the sequence shown as SEQ ID NO: 1 in the patent application. Appellants have 
demonstrated that SEQ ID N0:1 is a member of the GPCR family, and that the GPCR family of 
proteins includes glutamate GPCRs that function in neurotransmission, and play a role in certain 
neurological diorders. 

The Patent Examiner does not dispute any of the facts set forth in the previous paragraph. 
Neither does the Patent Examiner dispute that, if a polynucleotide encodes for a protein that has a 
substantial, specific and credible utility, then it follows that the polynucleotide also has a 
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substantial, specific and credible utility. 

The Examiner must accept the applicant's demonstration that the polypeptide encoded by 
the claimed invention is a member of the GPCR family and that utility is proven by a reasonable 
probability unless the Examiner can demonstrate through evidence or sound scientific reasoning 
that a person of ordinary skill in the art would doubt utility. See In re hanger, 503 F.2d 1380, 
1391-92, 183 USPQ 288 (CCPA 1974). The Examiner has not provided sufficient evidence or 
sound scientific reasoning to the contrary. 

Nor has the Examiner provided any evidence that any member of the GPCR family, let 
alone a substantial number of those members, is not useful. In such circumstances, the only 
reasonable inference is that the polypeptide encoded by the claimed invention must be, like the 
other members of the GPCR family, useful 

D, Objective evidence corroborates the utilities of the claimed invention 

There is, in fact, no restriction on the kinds of evidence a Patent Examiner may consider 
in determining whether a "real-world" utility exists. "Real-world" evidence, such as evidence 
showing actual use or conmiercial success of the invention, can demonstrate conclusive proof of 
utiUty. Raytheon v. Roper, 220 USPQ2d 592 (Fed. Cir. 1983); Nestle v. Eugene, 55 F.2d 854, 
856, 12 USPQ 335 (6th Cir. 1932). Indeed, proof that the invention is made, used or sold by any 
person or entity other than the patentee is conclusive proof of utility. United States Steel Corp, 
V. Phillips Petroleum Co,, 865 F.2d 1247, 1252, 9 USPQ2d 1461 (Fed. Cir. 1989). 

Over the past several years, a vibrant market has developed for databases containing the 
sequences of all expressed genes (along with the polypeptide translations of those genes), in 
particular genes having medical and pharmaceutical significance such as the instant sequence. 
(Note that the value in these databases is enhanced by their completeness, but each sequence in 
them is independently valuable.) The databases sold by Appellants' assignee, Incyte, include 
exactly the kinds of information made possible by the claimed invention, such as tissue and 
disease associations. Incyte sells its database containing the claimed sequence and millions of 
other sequences throughout the scientific community, including to pharmaceutical companies 
who use the information to develop new pharmaceuticals. 

Both Incyte' s customers and the scientific community have acknowledged that Incyte' s 
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databases have proven to be valuable in, for example, the identification and development of drug 
candidates. Page et al., in discussing the identification and assignment of candidate drug targets, 
state that "rapid identification and assignment of candidate targets and markers represents a huge 
challenge ... [t]he process of annotation is similarly aided by the quantity and richness of the 
sequence specific databases that are currently available, both in the public domain and in the 
private sector (e.g. those supplied by Licyte Pharmaceuticals)" Page, M.J. et al., "Proteomics: a 
major new technology for the drug discovery process," Drug Discov. Today 4:55-62 (1999) 
(Reference No. 6), see page 58, col. 2). As hicyte adds information to its databases, including 
the information that can be generated only as a result of Incyte's invention of the claimed 
polynucleotide and its use of that polynucleotide on cDNA microarrays, the databases become 
even more powerful tools. Thus the claimed invention adds more than incremental benefit to the 
drug discovery and development process. 

in. The Patent Examiner's rejections are without merit 

Rather than responding to the evidence demonstrating utility, the Examiner attempts to 
dismiss it altogether by arguing that the disclosed and well-established utilities for the claimed 
polynucleotide are not "specific, substantial, and credible" utilities. (Final Office Action at page 
3). The Examiner is incorrect both as a matter of law and as a matter of fact. 

A. The precise biological role or function of an expressed polynucleotide is not 
required to demonstrate utility 

The Patent Examiner's primary rejection of the claimed invention is based on the ground 
that, without information as to the precise "biological role" of the claimed invention, the claimed 
invention's utility is not sufficiently specific. According to the Examiner, it is not enough that a 
person of ordinary skill in the art could use and, in fact, would want to use the claimed invention 
either by itself or in a cDNA microarray to monitor the expression of genes for such appUcations 
as the evaluation of a drug's efficacy and toxicity. The Examiner vyould require, in addition, that 
the applicant provide a specific and substantial interpretation of the results generated in any 
given expression analysis. 

It may be that specific and substantial iaterpretations and detailed information on 
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biological function are necessary to satisfy the requirements for publication in some technical 
joxxmals, but they are not necessary to satisfy the requireinents for obtaining a United States 
patent. The relevant question is not, as the Examiner would have it, whether it is known how or 
why the invention works. In re Cortwright, 165 F.3d 1353, 1359 (Fed. Cir. 1999), but rather 
whether the invention provides an "identifiable benefit" in presently available form. Juicy Whip 
Inc. V. Orange Bang Inc., 185 F.3d 1364, 1366 (Fed. Cir. 1999). If the benefit exists, and there is 
a substantial likelihood the invention provides the benefit, it is useful. There can be no doubt, 
particularly in view of the First Bedilion Declaration (at, e.g., H 10 and 15), that the present 
invention meets this test. 

The threshold for determining whether an invention produces an identifiable benefit is 
low. Juicy Whip, 185 F.3d at 1366. Only those utilities that are so nebulous that a person of 
ordinary skill in the art would riot know how to achieve an identifiable benefit and, at least 
according to the PTO guideliaes, so-called "throwaway" utilities that are not directed to a person 
of ordiaary skill ia the art at all, do not meet the statutory requirement of utility. Utility 
Examination Guidelines, 66 Fed. Reg. 1092 (Jan. 5, 2001). 

Knowledge of the biological function or role of a biological molecule has never been 

required to show real- world benefit. In its most recent explanation of its own utility guideliaes, 

the PTO acknowledged as much (66 F.R. at 1095): 

[T]he utility of a claimed DNA does not necessarily depend on the function of the 
encoded gene product. A claimed DNA may have specific and substantial utility 
because, e,g. , it hybridizes near a disease-associated gene or it has gene-regulatiag 
activity. 

By implicitly requuing knowledge of biological function for any claimed nucleic acid, 
the Examiaer has, contrary to law, elevated what is at most an evidentiary factor into an absolute 
requirement of utility. Rather than looking to the biological role or function of the claimed 
invention, the Examiner should have looked first to the benefits it is alleged to provide. 

B. Membership in a class of useful products can be proof of utility 

Despite the uncontradicted evidence that the claimed polynucleotide encodes a 
polypeptide in the GPCR family, the Examiner refused to impute the utility of the members of 
the GPCR family to SEQ ID NO: 1 . hi the Fmal Office Action, the Patent Examiner takes the 
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position that, unless Appellants can identify which particular biological function within the class 
of GPCRs is possessed by SEQ ID NO: 1, utility cannot be imputed. See Final Office Action, 
page 4, To demonstrate utility by membership in the class of GPCRs, the Examiner woxild 
require that all GPCRs possess a "conomon" utility. 

There is no such requirement in the law. Li order to demonstrate utility by membership 
in a class, the law requires only that the class not contain a substantial number of useless 
members. So long as the class does not contain a substantial number of useless members, there 
is sufficient likelihood that the claimed invention will have utility, and a rejection under 
35 U.S.C. § 101 is improper. That is true regardless of how the claimed invention ultimately is 
used and whether or not the members of the class possess one utility or many. See Brenner v. 
Manson, 383 U.S. 519, 532 (1966); Application of Kirk, 376 F.2d 936, 943 (CCPA 1967). 

Membership in a "general" class is insufficient to demonstrate utility only if the class 
contains a sufficient number of useless members such that a person of ordinary skill in the art 
could not impute utility by a substantial likelihood. There would be, in that case, a substantial 
likelihood that the claimed invention is one of the useless meinbers of the class, hi the few cases 
in which class membership did not prove utility by substantial likelihood, the classes did in fact 
include predominately useless members. E,g,, Brenner (man-made steroids); Kirk (same); Natta 
(man-made polyethylene polymers). 

The Examiner addresses GPCRs as if the general class in which it is included is not the 
GPCR family, but rather all polynucleotides or all polypeptides, including the vast majority of 
useless theoretical molecules not occurring ia nature, and thus not pre-selected by nature to be 
useful. While these "general classes" may contain a substantial number of useless members, the 
GPCR family does not. The GPCR family is sufficiently specific to rule out any reasonable 
possibility that SEQ ID NO: 1 would not also be useful like the other members of the family. 

Because the Examiner has not presented any evidence that the GPCR class of signaling 
molecules has any, let alone a substantial number, of useless members, the Examiner must 
conclude that there is a "substantial likelihood" that the SEQ ID NO:l encoded by the claimed 
polynucleotide is useful. It follows that the claimed polynucleotide also is useful. 

C. Because the uses of the claimed polynucleotide in toxicology testing, drug 

discovery, and disease diagnosis are practical uses beyond mere study of the 
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invention itself, the claimed invention has substantial utCity 

The Examiner's rejection of the claims at issue as not having a "substantial" use is 
tantamount to a rejection based on an allegation that the only use of the. claimed invention is as a 
tool for further research. Because the PTO's rejection assumes a substantial overstatement of 
the law, and is incorrect in fact, it must be overturned. 

There is no authority for the proposition that use as a tool for research is not a substantial 

utiUty. hideed, the Patent Office has recognized that just because an invention is used in a 

research setting does not mean that it lacks utility (Section § 2107.01 of the Manual of Patent 

Examining Procedure, 8*^ Edition, August 2001, under the heading I. Specific and Substantial 

Requirements, Research Tools): 

Many research tools such as gas chromatographs, screening assays, and nucleotide 
sequencing techniques have a clear, specific and unquestionable utility (e.g., they are 
useful in analyzing compounds). An assessment that focuses on whether an invention is 
useful only in a research setting thus does not address whether the specific invention isrin 
fact "useful" in a patent sense. Instead, Office personnel must distinguish between 
inventions that have a specifically identified utility and inventions whose specific utility 
requires further research to identify or reasonably confirm. 

The Patent Office's actual practice has been, at least until the present, consistent with that 
approach. It has routinely issued patents for inventions whose only use is to facilitate research, 
such as DNA ligases. These are acknowledged by the PTO's Training Materials themselves to 
be useful, as well as DNA sequences used, for example, as markers. 

Only a limited subset of research uses are not "substantial" utilities: those in which the 
only known use for the claimed invention is to be an object of further study, thus merely inviting 
further research. This follows from Brenner, in which the U.S. Supreme Court held that a 
process for making a compound does not confer a substantial benefit where the only known use 
of the compound was to be the object of further research to determine its use. Id. at 535. 
Sunilarly, in Kirk, the Court held that a compoimd would not confer substantial benefit on the 
public merely because it might be used to synthesize some other, imknown compound that would 
confer substantial benefit. Kirk, 376 F.2d at 940, 945 ("What appellants are really saying to 
those in the art is take these steroids, experiment, and find what use they do have as medicines."). 
Nowhere do those cases state or imply, however, that a material cannot be patentable if it has 
some other beneficial use in research. 
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D. The Patent Examiner failed to demonstrate that a person of ordinary skill in 
the art would reasonably doubt the utility of the claimed invention 

The Examiner alleges that applicants asserted use of the claimed polynucleotide in the 

detection and diagnosis of cancer, in particular, thyroid cancer, is based on a correlation with 

thyroid cancer in on a single library representing follicular carcinoma of the thyroid. See 

specification, at page 35. Applicants reiterate that the asserted utility for the polynucleotide 

encoding SEQ ID N0:1 in the detection and diagnosis of follicular carcinoma of the thyroid, 

based on a significant (4-fold) differential expression in that disease condition, is both specific, 

substantial, and credible. The Examiners' allegation that the asserted utility is not credible 

because it is based on expression of the transcript in only one library ignores the fact that a 

number of thyroid libraries were examined representing both normal and diseased thyroid, and 

that only libraries associated with thyroid cancer were found to express the gene, hi particular, 

the gene was most highly expressed in a thyroid follicular carcinoma tumor Ubrary 

(THYRTUP02), but was also expressed ia a library associated with follicular adenoma 

(THYRNOT03), a precancerous condition to follicular carcinoma. Such evidence provides more 

than a "substantial likelihood" that the polynucleotide may be used in the detection and diagnosis 

of the disease. Further, the evidence provided from the Northem analysis for SEQ ID N0:7 

supports applicants assertion or the use of the claimed polynucleotide in cancer as disclosed in 

the Bandman '513 priority application at pages 29-30. The Examiners' reliance on references 

such as the NCI Guidelines for Marker Development to support her position is merely an attempt 

to raise the standard for utility to one of near certainty. However, the standard applicable in this 

case is not proof to certainty, but rather proof to reasonable probability. Brenner, 383 U.S. at 

532. 

Applicants' Showing of Facts Overcomes The Examiner's Concern That 
Applicants' Invention Lacks "Specific Utility" 

The Examiner alleges that the claimed invention is not supported by either a specific and 
substantial asserted utility or a well established utility. (Final Office Action, page 3.) 

Appellants' submission of additional facts overcomes this concem. Those facts 
demonstrate that, far from applying regardless of the specific properties of the claimed 
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invention, the utility of Appellants' claimed polynucleotides as gene- specific iprohts depends 
upon specific properties of the polynucleotides, that is, their nucleic acid sequences. 

"[EJach probe on ... [a "high density spotted microarray[]"], with careful design and 
sufficient length, and with sufficiently stringent hybridization and wash conditions, binds 
specifically and with minimal cross-hybridization, to the probe's cognate transcript" \ "[ejach 
gene included as a probe on a microarray provides a signal that is specific to the cognate 
transcript, at least to a first approximation." ^ Accordingly, "each additional probe makes an 
additional transcript newly detectable by the microarray, increasing the detection range, and thus 
versatility, of this analytical device for gene expression profiliag" ^, equally, "[e]ach new gene- 
specific probe added to a microarray thus increases the number of genes detectable by the device, 
iQcreasing the resolviug power of the device." 

Although not required for present purposes, it would be appropriate to state on the record 
here that the specificity of nucleic acid hybridization was well-established far earlier than the 
development of high density spotted microarrays in 1995, and indeed is the well-established 
underpinning of many, perhaps most, molecular biological techniques developed over the past 30 
- 40 years. 

IV, By requiring the patent applicant to assert a particular or unique utility, the Patent 
Examination Utility Guidelines and Training Materials applied by the Patent 
Examiner misstate the law 

There is an additional, independent reason to overturn the rejections: to the extent the 
rejections are based on Revised hiterim Utility Examination Guidelines (64 FR 71427, 
December 21, 1999), the final UtiUty Examination Guidelines (66 FR 1092, January 5, 2001) 
and/or the Revised Interim Utility Guidelines Training Materials (USPTO Website 
wwvv.uspto.gov, March 1, 2000), the Guideliaes and Training Materials are themselves 
inconsistent with the law. 



^ Declaration of Dr. John C. Rockett, T[ 10(1), emphasis added. 

^ Declaration of Dr. Vlshwanath R. Iyer, H 7 (emphasis added). See the footnote at H 7 for a slightly more 

"nuanced" view. 

^ Declaration of Dr. John C. Rockett, H 10(11). 

^ Declaration of Dr. Vlshwanath R. Iyer, H 7. 
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The Training Materials, which direct the Examiners regarding how to apply the Utility 
Guidelines, address the issue of specificity with reference to two kiads of asserted utilities: 
"specific" utilities which meet the statutory requirements, and "general" utilities which do not. 
The Training Materials define a "specific utility" as follows: 

A [specific utility] is specific to the subject matter claimed. This contrasts to general 
utihty that would be appUcable to the broad class of invention. For example, a claim to a 
polynucleotide whose use is disclosed simply as "gene probe" or "chromosome marker" 
would not be considered to be specific in the absence of a disclosure of a specific DNA 
target. Similarly, a general statement of diagnostic utiUty, such as diagnosing an 
xmspecified disease, would ordinarily be insufficient absent a disclosure of what condition 
can be diagnosed. 

The Training Materials distinguish between "specific" and "general" utilities by assessing 
whether the asserted utility is sufficiently "particular," le., unique (Training Materials at page 
52) as compared to the "broad class of invention." (In this regard, the Training Materials appear 
to parallel the view set forth in Stephen G. Kunin, Written Description Guidelines and Utility 
Guidelines , 82 J.P.T.O.S. 77, 97 (Feb. 2000) ("With regard to the issue of specific utility the 
question to ask is whether or not a utility set forth iq the specification is particular to the claimed 
invention.")). 

Such "unique" or "particular" utilities never have been required by the law. To meet the 
utility requirement, the invention need only be "practically useful," Natta, 480 F.2d 1 at 1397, 
and confer a "specific benefit" on the pubhc. Brenner, 383 U.S. at 534. Thus, incredible "throw- 
away" utilities, such as trying to "patent a transgenic mouse by saying it makes great snake 
food," do not meet this standard. Karen Hall, Genomic Warfare , The American Lawyer 68 (June 
2000) (quoting John Doll, Chief of the Biotech Section of USPTO). 

This does not preclude, however, a general utility, contrary to the statement in the 
Training Materials where "specific utiUty" is defined (page 5). Practical real-world uses are not 
limited to uses that are unique to an invention. The law requires that the practical utility be 
"definite," not particular. Montedison, 664 F.2d at 375. Appellants are not aware of any court 
that has rejected an assertion of utility on the grounds that it is not "particular" or "imique" to the 
specific invention. Where courts have found utility to be too "general," it has been in those cases 
in which the asserted utility in the patent disclosure was not a practical use that conferred a 
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specific benefit. That is, a person of ordinary skill in the art would have been left to guess as to 
how to benefit at all from the invention. In Kirk, for example, the CCPA held the assertion that a 
man-made steroid had "useful biological activity" was insufficient where there was no informa- 
tion in the specification as to how that biological activity could be practically used. Kirk, 376 
R2dat941. 

The fact that an invention can have a particular use does not provide a basis for requiring 
a particular use. See Brana, supra (disclosure describing a claimed antitumor compound as 
being homologous to an antitumor compound having activity against a "particular" type of 
cancer was determined to satisfy the specificity requirement). "Particularity" is not and never has 
been the sine qua non of utility; it is, at most, one of many factors to be considered. 

As described supra, broad classes of inventions can satisfy the utility requirement so long 
as a person of ordinary skill in the art would understand how to achieve a practical benefit from^ 
knowledge of the class. Only classes that encompass a significant portion of nonuseful members 
would fail to meet the utility requirement. Montedison, 664 F.2d at 374-75. 

The Training Materials fail to distinguish between broad classes that convey information 
of practical utility and those that do not, lumping all of them into the latter, impatentable 
category of "general" utilities. As a result, the Training Materials paint with too broad a brush. 
Rigorously applied, they would render impatentable whole categories of inventions that 
heretofore have been considered to be patentable and that have indisputably benefitted the public, 
including the claimed invention. See supra § II.B. Thus the Training Materials cannot be 
applied consistently with the law. 

V. To the extent the rejection of the clauned invention under 35 U.S.C. § 112, first 
paragraph, is based on the improper rejection for lack of utility under 35 U.S.C. 
§ 101, it must be reversed. 

The rejection set forth in the Office Action is based on the assertions discussed above, 
i.e., that the claimed invention lacks patentable utility. To the extent that the rejection under 35 
U.S.C. § 1 12, first paragraph, is based on the improper allegation of lack of patentable utility 
under 35 U.S.C. § 101, it fails for the same reasons. 



117940 



31 



09/895,686 



Docket No.: PC-0044 CIP 



CONCLUSION 

Appellants respectfully submit that rejections for lack of utility based, inter alia, on an 
allegation of "lack of specificity," as set forth in the Office Action and as justified in the Revised 
Interim and final Utility Guidelines and Training Materials, are not supported in the law. Neither 
are they scientifically correct, nor supported by any evidence or sound scientific reasoning. 
These rejections are alleged to be founded on facts in court cases such as Brenner and Kirk, yet 
those facts are clearly distinguishable from the facts of the instant application, and indeed most if 
not all nucleotide and protein sequence applications. Nevertheless, the PTO is attempting to 
mold the facts and holdings of these prior cases, "like a nose of wax," ^ to target rejections of 
claims to polypeptide and polynucleotide sequences, where biological activity information has 
not been proven by laboratory experimentation, and they have done so by ignoring perfectly 
acceptable utilities fully disclosed in the specifications as well as well-established utilities known 
to those of skill in the art. As is disclosed in the specification, and even more clearly, as one of 
ordinary skill in the art would understand, the claimed invention has well-established, specific, 
substantial and credible utilities. The rejections are, therefore, improper and should be reversed. 

Moreover, to the extent the above rejections were based on the Revised Interim and final 
Examination Guidelines and Training Materials, those portions of the Guidelines and Training 
Materials that form the basis for the rejections should be determined to be inconsistent with the 
law. 

Claims 1-6 stand rejected under 35 U.S.C. § 1 12, first paragraph, as containing subject 
matter which is not described in the specification hi such a was as to reasonably convey to one 
skilled in the relevant art that the iQventor(s), at the time the application was filed, had 
possession of the claimed invention. The rejection alleges in particular, that: 

while the specification describes a polypeptide sequence consisting of SEQ ED NO: 1, the 
claims encompass polypeptides comprising fragments and homologues that vary 



^ 'The concept of patentable subject matter under §101 is not *like a nose of wax which 
may be tumed and twisted in any direction * * White v. Dunbar, 119 U.S. 47, 51." (Parker v. 
Flook 198 USPQ 193 (US SupCt 1978)) 
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substantially in length and also in amino acid composition. The instant disclosure of a 
single polypeptide, that of SEQ ID NO: 1, does not support the scope of the claimed 
genus, which encompasses a substantial variety of subgenera. See Reagents of the 
University of California v Eli Lilly with respect to the premise that "A description of a 
genus of cDNAs may be achieved by means of a recitation of a representative number of 
cDNAs, defined by nucleotide sequence, falling within the scope of the genus, or a 
recitation of structural features common to the genus, which features constitute a 
substantial portion of the genus". The Examiner then cited various references alleging to 
support the unpredictability of protein function based on sequence homology. See, in 
particular, Vukicevic et al.; Tischer et al.; and Kopchick et al. The Examiner concluded 
by saying that given the unpredictability of homology comparisons, and the fact that the 
specification fails to provide objective evidence that the additional sequences are indeed 
species of the claimed genus, it cannot be established that a representative number of 
species have been disclosed by the claims. Further, the Examiner stated, no activity is set 
forth for the additional sequences. 

The recited fragments and variants of SEQ ID N0:1 and SEQ ID NO:2 are 
sufficiently described in chemical and structural terms that the skilled artisan would 
recognize applicant's possession of them at the time the application was filed 

With respect to fragments of SEQ ED NO: 1, as recited in claim 1, applicants submit that 

the recited fragments are disclosed in the specification and claims in terms of their specific 

amino acid sequences and therefore clearly meet the requirements for written description under 

35 U.S.C. § 112, first paragraph.. 

The claimed "homologues" of SEQ ID NO: 1 referred to by the Examiner presumably 
relate to variants of SEQ ID NO: 1 and SEQ ID NO:7, as recited in claims 1 and 2, respectively. 
Applicants submit that the polypeptides and polynucleotides of the invention, including the 
recited variants, are adequately described ia accordance with 35 U.S.C. § 1 12, first paragraph, 
and supported by relevant case law, some of which is referred to by the Examiner. 

The requirements necessary to fulfill the written description requirement of 35 U.S.C. 
1 12, first paragraph, are well established by case law. 
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... the applicant must also convey with reasonable clarity to those skilled 
in the art that, as of the filing date sought, he or she was in possession of the 
invention. The invention is, for purposes of the "written description" inquiry, 
whatever is now claimed. Vas-Cath, Inc. v. Mahurkar, 19 USPQ2d 1111, 1 1 17 
(Fed. Cir. 1991) 

Attention is also drawn to the Patent and Trademark Office's own "Guidelines for 
Examination of Patent Applications Under the 35 U.S.C. Sec. 1 12, para. 1", published January 5, 
200 1 , which provide that : 

An applicant may also show that an invention is complete by disclosure of 
sufficiently detailed, relevant identifying characteristics which provide evidence 
that applicant was in possession of the claimed ruvention, i.e., complete or partial 
structure, other physical and/or chemical properties, fimctional characteristics 
when coupled with a known or disclosed correlation between function and 
structure, or some combination of such characteristics. What is conventional or 
well known to one of ordruary skill in the art need not be disclosed in detail. If a 
skilled artisan would have imderstood the inventor to be in possession of the 
claimed invention at the time of filing, even if every nuance of the claims is not 
explicitly described in the specification, then the adequate description requirement 
is met. 

Thus, the written description standard is fulfilled by both what is specifically disclosed 
and what is conventional or well known to one skilled in the art. 

SEQ ED N0:1 and SEQ ID N0:7 are specifically disclosed in the priority application 
Serial No. 09/156,513 (see, for example, page 2, lines 34-37 and page 3, lines 13-14). Variants 
of SEQ ID NO: 1 and SEQ ID NO:7 are described, for example, at page 2, line 38 through page 3, 
line 2. In particular, the preferred, more preferred, and most preferred variants (80%, 90%, and 
95% amino acid sequence similarity to SEQ ID NO: 1) are described, for example, at page 12, 
lines 13-16 of priority application Serial No. 09/156,513. Incyte clones in which the nucleic 
acids encoding the human HGPRP-1 (SEQ ID NO: 1) were first identified and libraries from 
which those clones were isolated are described, for example, at page 11, lines 24-30 and Table 1 
of the priority application. Chemical and structural features of SEQ ID NO: 1 are described, for 
example, on page 11, lines 31-35 and Table 2 of the priority application. Given SEQ ID N0:1, 
one of ordinary skill in the art would recognize naturally-occurring variants of SEQ ID N0:1 
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having at least 90% sequence identity to SEQ ID N0:1. Accordingly, the Specification provides 
an adequate written description of the recited polypeptide sequences. 

A. The Specificati n pr vides an adequate written description of the claimed "variants" f 
SEQ ID NO:L 

The Office Action has further asserted that the claims are not supported by an adequate 
written description because: 

Claims 1-6 contain "subject matter which is not described in the specification in 
such a was as to reasonably convey to one skilled in the relevant art that the 
inventor(s), at the time the application was filed, had possession of the claimed 
invention". 

(page 8 of the Final Office Action) 

Such a position is believed to present a misapplication of the law. 

1. The present claims specifically define the claimed genus through the recitation of 
chemical structure 

Court cases in which "DNA claims" have been at issue (which are hence relevant to 
claims to proteras encoded by the DNA and antibodies which specifically bind to the proteins) 
commonly emphasize that the recitation of structural features or chemical or physical properties 
are important factors to consider in a written description analysis of such claims. For example, ia 
Fiers v. Revel, 25 USPQ2d 1601, 1606 (Fed. Cir. 1993), the court stated that: 

If a conception of a DNA requires a precise definition, such as by structure, 
formula, chemical name or physical properties, as we have held, then a description 
also requires that degree of specificity. 

In a number of instances in which claims to DNA have been found invalid, the courts 
have noted that the claims attempted to define the claimed DNA in terms of fimctional 
characteristics without any reference to structural features. As set forth by the court ia University 
of California v. Eli Lilly and Co. , 43 USPQ2d 1398, 1406 (Fed. Cir. 1997): 

In claims to genetic material, however, a generic statement such as "vertebrate 
insulin cDNA" or "mammalian insulin cDNA," without more, is not an adequate 
written description of the genus because it does not distinguish the claimed genus 
from others, except by function. 



117940 



35 



09/895,686 



Docket No.: PC-0044 CIP 



Thus, the mere recitation of functional characteristics of a DNA, without the definition of 
structural features, has been a common basis by which courts have found invalid claims to DNA. 
For example, in Lilly, 43 USPQ2d at 1407, the court found invalid for violation of the written 
description requirement the following claim of U.S. Patent No. 4,652,525: 

1 . A recombinant plasmid replicable in procaryotic host containing within its 
nucleotide sequence a subsequence having the structure of the reverse transcript of 
an mRNA of a vertebrate, which mRNA encodes insulin. 

In Fiers, 25 USPQ2d at 1603, the parties were in an interference involving the following 

count: 

A DNA which consists essentially of a DNA which codes for a human fibroblast 
interf eron-beta polypeptide. 

Party Revel in the Fiers case argued that its foreign priority application contained an 
adequate written description of the DNA of the count because that application mentioned a 
potential method for isolating the DNA. The Revel priority application, however, did not have a 
description of any particular DNA structure corresponding to the DNA of the count. The court 
therefore fovmd that the Revel priority application lacked an adequate written description of the 
subject matter of the count. 

Thus, in Lilly and Fiers, nucleic acids were defined on the basis of functional 
characteristics and were found not to comply with the written description requirement of 35 
U.S.C. § 1 12; Le, , "an mRNA of a vertebrate, which mRNA encodes insulin" in Lilly, and "DNA 
which codes for a human fibroblast interferon-beta polypeptide" in Fiers, In contrast to the 
situation in Lilly and Fiers, the claims at issue in the present application define polynucleotides 
and polypeptides in terms of chemical structure, rather than functional characteristics. For 
example, the "variant language" of independent claim 1 recites chemical structure to define the 
claimed genus: 

1 . An isolated cDNA comprising a nucleic acid encoding an amino acid sequence 
selected from:...c) a variant of SEQ ID NO: 1 having at least 90% amino acid 
sequence identity to SEQ ID NO: 1 . .. 

From the above it should be apparent that the claims of the subject application are 
fundamentally different from those found invalid in Lilly and Fiers. The subject matter of the 



117940 



36 



09/895,686 



Docket No.: PC-0044 CIP 

present claims is defined in terms of the chemical stmcture of SEQ ID N0:1. In the present case, 
there is no reliance merely on a description of functional characteristics of the polynucleotides or 
polypeptides recited by the claims. In fact, there is no recitation of functional characteristics. 
Moreover, if such functional recitations were included, it would add to the stractural 
characterization of the recited polynucleotides or polypeptides or. The polynucleotides or 
polypeptides defined in the claims of the present application recite structural features, and cases 
such as Lilly and Fiers stress that the recitation of structure is an important factor to consider in a 
written description analysis of claims of this type. By failing to base its written description 
inquiry "on whatever is now claimed," the Office Action failed to provide an appropriate analysis 
of the present claims and how they differ from those found not to satisfy the written description 
requirement m Lilly and Fiers 

2. The present claims do not define a genus which is "highly variant" 

Furthermore, the claims at issue do not describe a genus which could be characterized as 
highly variant, i.e., "encompassing a substantial variety of subgenera" (Final Office Action, page 
8). Available evidence illustrates that the claimed genus is of narrow scope. 

In support of this assertion, the Examiner's attention is directed to the reference by 
Brenner et al. ("Assessing sequence comparison methods with reliable structurally identified 
distant evolutionary relationships," Proc. Natl. Acad. Sci. USA (1998) 95:6073-6078; cited at 
page 29 of the instant application). Through exhaustive analysis of a data set of proteins with 
known structural and functional relationships and with <90% overall sequence identity, Brenner 
et al have determined that 30% identity is a reliable threshold for establishing evolutionary 
homology between two sequences aligned over at least 150 residues. (Brenner et al., pages 6073 
and 6076.) Furthermore, local identity is particularly important in this case for assessing the 
significance of the alignments, as Brenner et al. fxirther report that ^40% identity over at least 70 
residues is reliable in signifying homology between proteins. (Brenner et al., page 6076.) 

The present application is directed, inter alia, to GPCR proteins, in particular, 
metabotropic glutamate GPCR proteias related to the amino acid sequence of SEQ ID NO: 1. In 
accordance with Brenner et al, naturally occurring molecules may exist which could be 
characterized as metabotropic glutamate GPCR proteins and which have as little as 40% identity 
over at least 70 residues to SEQ ID N0:1. The "variant language" of the present claims recites, 
for example, polynucleotides encoding "an anndno acid sequence having at least 90% amino acid 
sequence identity SEQ ID NO:l" (note that SEQ ID N0:1 has 441 amino acid residues). This 
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variation is far less than that of all potential metabotropic glutamate GPCR proteins related to 
SEQ ID N0:1, i.e., those metabotropic glutamate GPCR proteins having as little as 40% identity 
over at least 70 residues to SEQ ID NO: 1 . 

3. The state of the art at the tune of the present invention is further advanced than at 
the time of the Lilly and Fiers applications 

In the Lilly case, claims of U.S. Patent No. 4,652,525 were found invalid for failing to 
comply with the written description requirement of 35 U.S.C. § 1 12. The '525 patent claimed the 
benefit of priority of two applications, Application Serial No. 801,343 filed May 27, 1977, and 
Application Serial No, 805,023 filed June 9, 1977. In the Fiers case, party Revel claimed the 
benefit of priority of an Israeli application filed on November 21, 1979. Thus, the written 
description inquiry in those case was based on the state of the art at essentially at the "dark ages" 
of recombinant DNA technology. 

The present application has a priority date of September 17, 1998. Much has happened in 
the development of recombinant DNA technology in the 20 or more years from the time of filing 
of the applications involved in Lilly and Fiers and the present application. For example, the 
technique of polymerase chain reaction (PCR) was invented. Highly efficient cloning and DNA 
sequencing technology has been developed. Large databases of protein and nucleotide sequences 
have been compiled. Much of the raw material of the human and other genomes has been 
sequenced. With these remarkable advances one of skill in the art would recognize that, given 
the sequence information of SEQ ID NO: 1 and SEQ ID NO:7, and the additional extensive detail 
provided by the subject application, the present inventors were in possession of the claimed 
polynucleotide variants at the time of filing of this application. 

4. Summary 

The Office Action failed to base its written description inquiry "on whatever is now 
claimed." Consequently, the Action did not provide an appropriate analysis of the present claims 
and how they differ from those foimd not to satisfy the written description requirement in cases 
such as Lilly and Fiers. In particular, the claims of the subject application are fundamentally 
different from those found invalid in Lilly and Fiers. The subject matter of the present claims is 
defined in terms of the chemical structure of SEQ ED NO: 1 or SEQ ID N0:7. The courts have 
stressed that structural features are important factors to consider in a written description analysis 
of claims to nucleic acids and proteins. In addition, the genus of polynucleotides or polypeptides 
defined by the present claims is adequately described, as evidenced by Brenner et al and 
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consideration of the claims of the '740 patent involved m Lilly. Furthermore, there have been 
remarkable advances in the state of the art since the Lilly and Fiers cases, and these advances 
were given no consideration whatsoever m the position set forth by the Office Action. 

Claims 1 and 3-6 stand rejected under 35 U.S.C, § 102(b) as anticipated by Valenzuela et 
al. (WO 99/55271, November 4, 199) and, alternatively under 35 U.S.C. § 102(e) as anticipated 
by Moore et al. (U.S. PubUshed Application 2003005536, effective filing date June 17, 1999). 
The rejection alleges in particular, that: 

Valenzuela disclose a nucleic acid molecule (SEQ ID NO:43, claim 52) that encodes a 
protem (SEQ ID NO:45, claim 53) that is 100% identical to the polypeptide of SEQ ID 
N0:7 of the instant application, thus anticipating the claims. Valeuzuela et al. also teach 
vectors, host cells, a method of producing protein, and labeled cDNA. 

Moore et al. disclose a nucleic acid moecule (SEQ ID NO:22) that encodes a protein 
(SEQ ID NO: 146) that is 100% identical to the polypeptide of SEQ ID NO: 1 from amino 
acids 1-384 of the instant application, and therefore discloses an isolated cDNA encoding 
a fragment of SEQ ID NO: 1 from I51-V72, G88-V109, CI 16-A145, 1156-L175, M207- 
P229, or G242-T264 of SEQ ID NO: 1, as recited m claim 1. Moore et al also teach 
vectors, host cells, and a method of making a protein, therefore anticipating claims 3-6 as 
well. 

Because the instant application does not meet the requirements of 35 U.S.C. § 1 12, first 
paragraph, for the reasons given above, and it is a continuation of application Serial No. 
09/5 16,5 13, the prior application does not meet these requirements and therefore is 
unavailable under 35 U.S.C. § 120. Under these circumstances, Valenzuela et al. and 
Moore et al. anticipate the claimed invention. 

The now claimed invention, at least as recited in claims 1 and 3-6, is supported by 
both a specific and substantial asserted utility and a well established utility that is disclosed 
and enabled in priority application Serial No. 09/516,513 

Applicants submit that, for the reasons cited above in response to the rejection of claims 

under 35 U.S.C. §§ 101/112, the specification supports a specific and substantial asserted utility, 

as well as a well established utility for the claimed invention that is similarly disclosed in the 

priority apphcation Serial No. 09/516,513 in accordance with 35 U.S.C. § 120, therefore 

providing an effective filing date for the instant application of September 17, 1998. 
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Due to the urgency of this matter and its economic and pubUc health implications, an 
expedited review of this appeal is earnestly solicited. 

If the USPTO determines that any additional fees are due, the Commissioner is hereby 
authorized to charge Deposit Account No. 09-0108. 

This brief is enclosed in triplicate. 



Respectfully submitted, 
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APPENDIX - CLAIMS ON APPEAL 

1 . An isolated cDNA comprising a nucleic acid encoding an amino acid sequence_selected 
from: 

a) an amino acid sequence of SEQ ED NO: 1 ; 

b) a fragment of SEQ ED N0:1 fromI51-V72, G88-V109, C116-A145, 1156-L175, 
M207-P229, or G242-T264 of SEQ ID NO: 1 ; 

c) a variant of SEQ ID NO: 1 having at least 90% amino acid sequence identity to SEQ 
IDNO:l;and 

d) the coniplement of the encoding nucleic acid sequence of a), b), or c). 

2. An isolated cDNA comprising a nucleic acid sequence selected from: 

a) SEQIDN0:7;and 

b) a variant of SEQ ID N0:7 havmg at least 95% identity to SEQ ID N0:7. 

3. A composition comprising the cDNA of claim 1 and a labeling moiety. 

4. A vector comprising the cDNA of claim 1. 

5. A host cell comprising the vector of claim 4. 

6. A method for using a cDNA to produce a protein, the method comprising: 

a) culturing the host cell of claim 5 under conditions for protein expression; and 

b) recovering the protein from the host cell culture. 

117940 42 09/895,686 



Docket No.: PC-0044 CI? 
USSN: 09/895,686 
Ref. No. 1 of 6 



Proc. Natl. Acad. Sci. USA 

Vol. 94, pp. 8945-8947, August 1997 

Applied Biological Sciences 



Whole genome analysis: Experimental access to all genome 
sequenced segments through larger-scale efficient 
oligonucleotide synthesis and PGR 

DEVAL a. LASHKARI*t, JoHN H. McCUSKER*, AND RONALD W. DaVIS*§ 

^ '^^^^ *'^^>»"--. of Micn,biology. 3020 Duke Univc.i.y 

Contributed by Ronald W, Davis, May 20, 1997 



ABSTRACT The recent ability to sequence whole genomes 
allows ready access to all genetic material. The approaches 
outlined here allow automated analysis of sequence for the 
synthesis of optimal primers in an automated multiplex 
oligonucleotide synthesizer (AMOS). The efTiciency is such 
that all ORFs for an organism can be amplified by PCR. The 
resulting amplicons can be used directly in the construction of 
DNA arrays or can be cloned for a large variety of functional 
analyses. These tools allow a replacement of single-gene 
analysis with a highly efficient whole-genome analysis. 

The genome sequencing projects have generated and will 
continue to generate enormous amounts of sequence data. The 
genomes of Saccharomyces cerevisiae, Escherichia coli, Hae- 
mophilus influenzae (1), Mycoplasma genitalium (2), and Meth- 
anococcus jannaschii (3) have been completely sequenced. 
Other model organisms have had substantial portions of their 
genomes sequenced as well, including the nematode Caeno- 
rhabditis elegans (4) and the small flowering p\2iT\i Arabidopsis 
thaliana (5). This massive and increasing amount of sequence 
Information allows the development of novel experimental 
approaches to identify gene function. 

One standard use of genome sequence data is to attempt to 
identify the functions of predicted open reading frames 
(ORFs) within the genome by comparison to genes of known 
function. Such a comparative analysis of all ORFs to existing 
sequence data is fast, simple, and requires no experimentation 
and is therefore a reasonable first step. While finding sequence 
homologies/motifs is not a substitute for experimentation, 
noting the presence of sequence homology and/or sequence 
motifs can be a useful first step in finding interesting genes, in 
designing experiments and, in some cases, predicting function. 
However, this type of analysis is frequently un informative. For 
example, over one-half of new ORFs in 5. cerevisiae have no 
known function (6). If this is the case in a well studied organism 
such as yeast, the problem will be even worse in organisms that 
are less well studied or less manipulable. A large, experimen- 
tally determined gene function database would make homol- 
ogy/motif searches much more useful. 

Experimental analysis must be performed to thoroughly 
understand the biological function of a gene product. Scaling 
up from classical "cottage industry" one-gene-oriented ap- 
proaches to whole-genome analysis would be very expensive 
and laborious. It is clear that novel strategies are necessary to 
efficiently pursue the next phase of the genome projects— 
whole-genome experimental analysis to explore gene expres- 
sion, gene product function, and other genome functions. 
Model organisms, such as 5. cerevisiae, will be extremely 

The publication costs of this article were defrayed in part by page charge 
payment. This article must therefore be hereby marked ''advertisement'' in 
accordance with 18 U.S.C. §1734 solely to indicate this fact. 
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important in the development of novel whole-genome analysis 
techniques and, subsequently, in improving our understanding 
of other more complex and less manipulable organisms. 

The genome sequence can be systematically used as a tool 
to understand ORFs, gene product function, and other ge- 
nome regions. Toward this end, a directed strategy has been 
developed for exploiting sequence information as a means of 
providing information about biological function (Fig. 1). Ef- 
forts have been directed toward the amplification of each 
predicted ORF or any other region of the genome ranging 
from a few base pairs to several kilobase pairs. There are many 
uses for these amplicons— they can be cloned into standard 
vectors or specialized expression vectors, or can be cloned into 
other specialized vectors such as those used for two-hybrid 
analysis. The amplicons can also be used directly by, for 
example, arraying onto glass for expression analysis, for DNA 
binding assays, or for any direct DNA assay (7). As a pilot 
study, synthetic primers were made on the 96-well automated 
multiplex oligonucleotide synthesizer (AMOS) instrument (8) 
(Fig. 2). These oligonucleotides were used to amplify each 
ORF on yeast chromosome V, The current version of this 
instrument can synthesize three plates of 96 oligonucleotides 
each (25 bases) in an 8-hr day. The amplification of the entire 
set of PCR products was then analyzed by gel electrophoresis 
(Fig. 3). Successful amplification of the proper length product 
on the first attempt was 95%. This project demonstrates that 
one can go directly from sequence information to biological 
analysis in a truly automated, totally directed manner. 

These amplicons can be incorporated directly in arrays or 
the amplicons can be cloned. If the amplicons are to be cloned, 
novel sequences can be incorporated at the 5' end of the 
oligonucleotide to facilitate cloning. One potential problem 
with cloning PCR products is that the cloned amplicons may 
contain sequence alterations that diminish their utility. One 
option would be to resequence each individual amplicon. 
However, this is expensive, inefficient, and time consuming. A 
faster, more cost-effective, and more accurate approach is to 
apply comparative sequencing by denaturing HPLC (9). This 
method is capable of detecting a single base change in a 2-kb 
heteroduplex. Longer amplicons can be analyzed by use of 
appropriate restriction fragments. If any change is detected in 
a clone, an alternate clone of the same region can be analyzed. 
Modifying the system to allow high throughput analysis by 
denaturing HPLC is also relatively simple and straightforward. 

If amplicons are used directly on arrays without cloning, it 
is important to note that, even if single PCR product bands are 
observed on gels, the PCR products will be contaminated with 
various amounts of other sequences. This contamination has 
the potential to affect the results in, for example, expression 

tPresent address: Synteni, Inc., 6519 Dumbarton Circle, Fremont, CA 
94555. 
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Biochemistry, Beckman Center, B400, Stanford University, Stanford, 
CA 94305-5307. e-mail: gilbert@cmgm.stanford.edu. 
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Fk;. 1. Overview of systematic method for isolating individual 
genes. Sequence information is obtained automatically from sequence 
databases. The data are input into primer selection software specifi- 
cally designed to target ORFs as designated by database annotations. 
The output file containing the primer information is directly read by 
a high-throughput oligonucleotide synthesizer, which makes the oli- 
gonucleotides in 96-well plates (AMOS, automated multiplex oligo- 
nucleotide synthesizer). The forward and reverse primers are synthe- 
sized m the same location on separate plates to facilitate the down- 
stream handling of primers. The amplicons are generated by PGR in 
96-well plates as well. 

analysis. On the other hand, direct use of the amplicons is 
much less labor intensive and greatly decreases the occurrence 
of mistakes in clone identification, a ubiquitous problem 
associated with large clone set archiving and retrieving. 

Any large-scale effort to capture each ORF within a genome 
must rely on automation if cost is to be minimized while 
efficiency is maximized. Toward that end, primers targeting 
ORFs were designed automatically using simple new scripts 
and existing primer selection software. These script-selected 
primer sequences were directly read by the high-throughput 
synthesizer and the forward and reverse primers were synthe- 
sized in separate plates in corresponding wells to facilitate 
automated pipetting and PCR amplifications. Each of the 
resulting PCR products, generated with minimum labor, con- 
tains a known, unique ORF. 

Large-scale genome analysis projects are dependent on 
newly emerging technologies to make the studies practical and 
economically feasible. For example, the cost of the primers, a 
significant issue in the past, has been reduced dramatically to 
make feasible this and other projects thai require tens of 
thousands of oligonucleotides. Other methods of high- 
throughput analysis are also vital to the success of functional 
analysis projects, such as microarraying and oligonucleotide 
chip methods (10-14). 

Changes in attitude are also required. One of the major costs 
of commercial oligonucleotides is extensive quality control 
such that virtually 100% of the supplied oligonucleotides are 
successfully synthesized and work for their intended purpose 
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Fk;. 2. Overall approach for using database of a genome to direct 
biological analysis. The synthesis of the 6,000 ORFs (orfs) for each 
gene of 5. cerevisiae can be used in many applications utilizing both 
cloning and microarraying technology. 

Considerable cost reduction can be obtained by simply de- 
creasing the expected successful synthesis rate to 95-97%. One 
can then achieve faster and cheaper whole genome coverage by 
simply adding a single quality control at the end of the 
experiment and batching the failures for resynthesis. 

The directed nature of the amplicon approach is of clear 
advantage. The sequence of each ORF is analyzed automati- 
cally, and unique specific primers are made to target each 
ORF. Thus, there is relatively little time or labor involved— for 
example, no random cloning and subsequent screening is 
required because each product is known. In the test system, 
primers for 240 ORFs from chromosome V were systematically 
synthesized, beginning from the left arm and continuing 
through to the right arm. At no point was there any manual 
analysis of sequence information to generate the collection. In 
many ways, now that the sequence is known, there is no need 
for the researcher to examine it. 

These amplicons can be arrayed and expression analysis can 
be done on all arrayed ORFs with a single hybridization (10). 
Those ORFs that display significant differential expression 
patterns under a given selection are easily identified without 
the laborious task of searching for and then sequencing a clone. 
Once scaled up, the procedure provides even greater returns 
on effort, because a single hybridization will ultimately provide 
a ^^snapshot" of the expression of all genes in the yeast genome. 
Thus, the limiting factor in whole genome analysis will not be 
the analysis process itself, but will instead be the ability of 
researchers to design and carry out experimental selections. 

Current expression and genetic analysis technologies are 
geared toward the analysis of single genes and are ill suited to 
analyze numerous genes under many conditions. Additional 
difficulties with current technologies include: the effort and 
expense required to analyze expression and make mutants, the 
potential duplication of effort if done by different laboratories, 
and the possibility of conflicting results obtained from differ- 
ent laboratories. In contrast, whole genome analysis not only 
is more efficient, it also provides data of much higher quality; 
all genes are assayed and compared in parallel under exactly 
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amplification reactions is shown 

the same conditions. In addition, amplicons have many appli- 
cations beyond gene expression. For example, one recent 
approach is to incorporate a unique DNA sequence tag, 
synthesized as part of each gene specific primer, during 
amplification. The tags or molecular bar codes, when reintro- 
duced into the organism as a gene deletion or as a gene clone, 
can be used much more efficiently than individual mutations 
or clones because pools of tagged mutants or transformants 
can be analyzed in parallel. This parallel analysis is possible 
because the tags are readily and quantitatively amplified even 
in complex mixtures of tags (13). 

These ORF genome arrays and oligonucleotide lagged 
libraries can be used for many applications. Any conventional 
selection applied to a library that gives discrete or multiple 
products can use these technologies for a simple direct read- 
out. These include screens and selections for mutant comple- 
mentation, overexpression suppression (15, 16), second-site 
suppressors, synthetic lethality, drug target overexpression 
( 1 7), two-hybrid screens (18), genome mismatch scann ing ( 19), 
or recombination mapping. 

The genome projects have provided researchers with a vast 
amount of information. These data must be used efficiently 
and systematically to gain a truly comprehensive understand- 
ing of gene function and, more broadly, of the entire genome 
which can then be applied to other organisms. Such global 
approaches are essential if we are to gain an understanding of 
the living cell. This understanding should come from the 
viewpoint of the integration of complex regulatory networks, 
the individual roles and interactions of thousands of functional 
gene products, and the effect of environmental changes on 
both gene regulatory networks and the roles of all gene 
products. The time has come to switch from the analysis of a 
single gene to the analysis of the whole genome. 

Support was provided by National Institutes of Health Grants 
R.'17H60198 and P01H600205. 
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The availability of genome-scale DNA sequence infornnation and reagents has radically altered life-science 
research. This revolution has led to the development of a new scientific subdiscipline derived from a combina- 
tion of the fields of toxicology and genomics. This subdiscipline, termed toxicogenomics, is concerned with the 
identification of potential human and environmental toxicants, and their putative mechanisms of action, through 
the use of genomics resources. One such resource is DNA microarrays or "chips/' which allow the monitoring of 
the expression levels of thousands of genes simultaneously. Here we propose a general method by which gene 
expression, as measured by cDNA microarrays, can be used as a highly sensitive and informative marker for 
toxicity. Our purpose is to acquaint the reader with the development and current state of microarray technol- 
ogy and to present our view of the usefulness of microarrays to the field of toxicology. Mol. Carcinog. 24:153- 

159, 1999. © 1999 Wiley-Liss, Inc. 
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INTRODUCTION 

Technological advancements combined with in- 
tensive DNA sequencing efforts have generated an 
enormous database of sequence information over the 
past decade. To date, more than 3 million sequences, 
totaling over 2.2 billion bases [1], are contained 
within the GenBank database, which includes the 
complete sequences of 19 different organisms [2]. The 
first complete sequence of a free-living organism, 
Haemophilus influenzae, was reported in 1995 [3] and 
was followed shortly thereafter by the first complete 
sequence of a eukaryote, Saccharomyces cervisiae [4]. 
The development of dramatically improved sequenc- 
ing methodologies promises that complete elucida- 
tion of the Homo sapiens DNA sequence is not far 
behind [5]. 

To exploit more fully the wealth of new sequence 
information, it was necessary to develop novel meth- 
ods for the high-throughput or parallel monitoring 
of gene expression. Established methods such as 
northern blotting, RNAse protection assays, SI nu- 
clease analysis, plaque hybridization, and slot blots 
do not provide sufficient throughput to effectively 
utilize the new genomics resources. Newer methods 
such as differential display [6], high-density filter 
hybridization [7,8], serial analysis of gene expression 
[9], and cDNA- and oligonucleotide-based microarray 
"chip" hybridization [10-12] are possible solutions 
to this bottleneck. It is our belief that the microarray 
approach, which allows the monitoring of expres- 
sion levels of thousands of genes simultaneously, is 
a tool of unprecedented power for use in toxicology 
studies. 



Almost without exception, gene expression is al- 
tered during toxicity, as either a direct or indirect 
result of toxicant exposure. The challenge facing 
toxicologists is to define, under a given set of ex- 
perimental conditions, the characteristic and spe- 
cific pattern of gene expression elicited by a given 
toxicant. Microarray technology offers an ideal plat- 
form for this type of analysis and could be the foun- 
dation for a fundamentally new approach to 
toxicology testing. 

MICROARRAY DEVELOPMENT AND APPLICATIONS 
cDNA Microarrays 

In the past several years, numerous systems were 
developed for the construction of large-scale DNA 
arrays. All of these platforms are based on cDNAs 
or oligonucleotides immobilized to a solid sup- 
port. In the cDNA approach, cDNA (or genomic) 
clones of interest are arrayed in a multi-well for- 
mat and amplified by polymerase chain reaction. 
The products of this amplification, which are usu- 
ally 500- to 2000-bp clones from the 3' regions of 
the genes of interest, are then spotted onto solid 
support by using high-speed robotics. By using 
this method, microarrays of up to 10 000 clones 
can be generated by spotting onto a glass substrate 
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[13,14]. Sample detection for microarrays on glass 
involves the use of probes labeled with fluores- 
cent or radioactive nucleotides. 

Fluorescent cDNA probes are generated from con- 
trol and test RNA samples in single-round reverse-tran- 
scription reactions in the presence of fluorescently 
tagged dUTP (e.g., Cy3-dUTP and Cy5-dUTP), which 
produces control and test products labeled with dif- 
ferent fluors. The cDNAs generated from these two 
populations, collectively termed the "probe," are then 
mixed and hybridized to the array under a glass cov- 
erslip [10,11,15]. The fluorescent signal is detected 
by using a custom-designed scanning confocal mi- 
croscope equipped with a motorized stage and lasers 
for fluor excitation [10,11,15]. The data are analyzed 
with custom digital image analysis software that de- 
termines for each DNA feature the ratio of fluor 1 to 
fluor 2, corrected for local background [16,17]. The 
strength of this approach lies in the ability to label 
RNAs from control and treated samples with differ- 
ent fluorescent nucleotides, allowing for the simul- 
taneous hybridization and detection of both 
populations on one microarray. This method elimi- 
nates the need to control for hybridization between 
arrays. The research groups of Drs. Patrick Brown and 
Ron Davis at Stanford University spearheaded the 
effort to develop this approach, which has been suc- 
cessfully applied to studies of Arabidopsis thaliana 
RNA [10], yeast genomic DNA [15], tumorigenic ver- 
sus non-tumorigenic human tumor cell lines [11], 
human T-cells [18], yeast RNA [19], and human in- 
flammatory disease-related genes [20]. The most dra- 
matic result of this effort was the first published 
account of gene expression of an entire genome, that 
of the yeast Saccharomyces cervisiae [21]. 

In an alternative approach, large numbers of cDNA 
clones can be spotted onto a membrane support, al- 
beit at a lower density [7,22]. This method is useful 
for expression profiling and large-scale screening and 
mapping of genomic or cDNA clones [7,22-24]. In 
expression profiling on filter membranes, two dif- 
ferent membranes are used simultaneously for con- 
trol and test RNA hybridizations, or a single 
membrane is stripped and reprobed. The signal is 
detected by using radioactive nucleotides and visu- 
alized by phosphorimager analysis or autoradiogra- 
phy. Numerous companies now sell such cDNA 
membranes and software to analyze the image data 
[25-27]. 

Oligonucleotide Microarrays 

Oligonucleotide microarrays are constructed either 
by spotting prefabricated oligos on a glass support 
[13] or by the more elegant method of direct in situ 
oligo synthesis on the glass surface by photolithog- 
raphy [28-30]. The strength of this approach lies in 
its ability to discriminate DNA molecules based on 
single base-pair difference. This allows the applica- 
tion of this method to the fields of medical diagnos- 



tics, pharmacogenetics, and sequencing by hybrid- 
ization as well as gene-expression analysis. 

Fabrication of oligonucleotide chips by photoli- 
thography is theoretically simple but technically 
complex [29,30]. The light from a high-intensity 
mercury lamp is directed through a photolitho- 
graphic mask onto the silica surface, resulting in 
deprotection of the terminal nucleotides in the illu- 
minated regions. The entire chip is then reacted with 
the desired free nucleotide, resulting in selected chain 
elongation. This process requires only 4n cycles 
(where n = oligonucleotide length in bases) to syn- 
thesize a vast number of unique oligos, the total num- 
ber of which is limited only by the complexity of the 
photolithographic mask and the chip size [29,31,32]. 

Sample preparation involves the generation of 
double-stranded cDNA from cellular poly(A)+ RNA 
followed by antisense RNA synthesis in an in vitro 
transcription reaction with biotinylated or fluor- 
tagged nucleotides. The RNA probe is then frag- 
mented to facilitate hybridizarion. If the indirect 
visualization method is used, the chips are incubated 
with fluor-linked streptavidin (e.g., phycoerythrin) 
after hybridization [12,33]. The signal is detected with 
a custom confocal scanner [34]. This method has 
been applied successfully to the mapping of genomic 
library clones [35], to de novo sequencing by hybrid- 
ization [28,36], and to evolutionary sequence com- 
parison of the BRCAl gene [37]. In addition, 
mutations in the cystic fibrosis [38] and BRCAl [39] 
gene products and polymorphisms in the human im- 
munodeficiency virus-1 clade B protease gene [40] 
have been detected by this method. Oligonucleotide 
chips are also useful for expression monitoring [33] 
as has been demonstrated by the simultaneous evalu- 
ation of gene-expression patterns in nearly all open 
reading frames of the yeast strain 5. cerevisiae [12]. 
More recently, oligonucleotide chips have been used 
to help identify single nucleotide polymorphisms in 
the human [41] and yeast [42] genomes. 

THE USE OF MICROARRAYS IN TOXICOLOGY 

Screening for Mechanism of Action 

The field of toxicology uses numerous in vivo 
model systems, including the rat, mouse, and rab- 
bit, to assess potential toxicity and these bioassays 
are the mainstay of toxicology testing. However, in 
the past several decades, a plethora of in vitro tech- 
niques have been developed to measure toxicity, 
many of which measure toxicant-induced DNA dam- 
age. Examples of these assays include the Ames test, 
the Syrian hamster embryo cell transformarion as- 
say, micronucleus assays, measurements of sister 
chromatid exchange and unscheduled DNA synthe- 
sis, and many others. Fundamental to all of these 
methods is the fact that toxicity is often preceded 
by, and results in, alterations in gene expression. In 
many cases, these changes in gene expression are a 
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far more sensitive, characteristic, and measurable 
endpoint than the toxicity itself. We therefore pro- 
pose that a method based on measurements of the 
genome-wide gene expression pattern of an organ- 
ism after toxicant exposure is fundamentally infor- 
mative and complements the established methods 
described above. 

We are developing a method by which toxicants 
can be identified and their putative mechanisms of 
action determined by using toxicant-induced gene ex- 
pression profiles. In this method, in one or more de- 
fined model systems, dose and time-course parameters 
are established for a series of toxicants within a given 
prototypic class (e.g., polycyclic aromatic hydrocar- 
bons (PAHs)). Cells are then treated with these agents 
at a fixed toxicity level (as measured by cell survival), 
RNA is harvested, and toxicant-induced gene expres- 
sion changes are assessed by hybridization to a cDNA 
microarray chip (Figure 1). We have developed a cus- 
tom DNA chip, called ToxChip vl.O, specifically for 
this purpose and will discuss it in more detail below. 
The changes in gene expression induced by the test 
agents in the model systems are analyzed, and the 
common set of changes unique to that class of toxi- 
cants, termed a toxicant signature, is determined. 

This signature is derived by ranking across all ex- 
periments the gene-expression data based on rela- 

Control 
Population 



tive fold induction or suppression of genes in treated 
samples versus untreated controls and selecting the 
most consistently different signals across the sample 
set. A different signature may be established for each 
prototypic toxicant class. Once the signatures are de- 
termined, gene-expression profiles induced by un- 
known agents in these same model systems can then 
be compared with the established signatures. A match 
assigns a putative mechanism of action to the test 
compound. Figure 2 illustrates this signature method 
for different types of oxidant stressors, PAHs, and 
peroxisome prolif era tors. In this example, the un- 
known compound in question had a gene-expres- 
sion profile similar to that of the oxidant stressors in 
the database. We anticipate that this general method 
will also reveal cross talk between different pathways 
induced by a single agent (e.g., reveal that a com- 
pound has both PAH-like and oxidant-like proper- 
ties). In the future, it may be necessary to distinguish 
very subtle differences between compounds within 
a very large sample set (e.g., thousands of highly simi- 
lar structural isomers in a combinatorial chemistry 
library or peptide library). To generate these highly 
refined signatures, standard statistical clustering tech- 
niques or principal -component analysis can be used. 

For the studies outlined in Figure 2, we developed 
the custom cDNA microarray chip ToxChip vl.O. 
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Figure 1 . Simplified overview of the method for sample trative purposes, samples derived from cell culture are depicted 
preparation and hybridization to cDNA microarrays. For illus- although other sample types are amenable to this analysis. 
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Figure 2. Schematic representation of the method for iden- 
tification of a toxicant's mechanism of action. In this method, 
gene-expression data derived from exposure of model sys- 
tems to known toxicants are analyzed, and a set of changes 
characteristic to that type of toxicant (termed the toxicant 
signature) is identified. As depicted, oxidant stressors produce 

The 2090 human genes that comprise this subarray 
were selected for their well-documented involve- 
ment in basic cellular processes as well as their re- 
sponses to different types of toxic insult. Included 
on this list are DNA replication and repair genes, 
apoptosis genes, and genes responsive to PAHs and 
dioxin-like compounds, peroxisome proliferators, 
estrogenic compounds, and oxidant stress. Some of 
the other categories of genes include transcription 
factors, oncogenes, tumor suppressor genes, cycUns, 
kinases, phosphatases, cell adhesion and motility 
genes, and homeobox genes. Also included in this 
group are 84 housekeeping genes, whose hybridiza- 
tion intensity is averaged and used for signal nor- 
malization of the other genes on the chip. To date, 
very few toxicants have been shown to have appre- 
ciable effects on the expression of these housekeep- 
ing genes. However, this housekeeping list will be 
revised if new data warrant the addition or deletion 
of a particular gene. Table 1 contains a general de- 
scription of some of the different classes of genes 
that comprise ToxChip vl .0. 

When a toxicant signature is determined, the 
genes within this signature are flagged within the 
database. When uncharacterized toxicants are then 
screened, the data can be quickly reformatted so that 
blocks of genes representing the different signatures 
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consistent changes in group A genes (indicated by red and 
green circles), but not group B or C genes (indicated by gray 
circles). The set of gene-expression changes elicited by the 
suspected toxicant is then compared with these characteristic 
patterns, and a putative mechanism of action is assigned to 
the unknown agent. 

are displayed [11]. This facilitates rapid, visual in- 
terpretation of data. We are also developing Tox- 
Chip v2.0 and chips for other model systems, 
including rat, mouse, Xenopus, and yeast, for use in 
toxicology studies. 

Animal Models in Toxicology Testing 

The toxicology community rehes heavily on the 
use of animals as model systems for toxicology test- 
ing. Unfortunately, these assays are inherently ex- 
pensive, require large numbers of animals and take a 
long time to complete and analyze. Therefore, the 
National Institute of Environmental Health Sciences 
(NIEHS), the National Toxicology Program, and the 
toxicology community at large are committed to re- 
ducing the number of animals used, by developing 
more efficient and alternative testing methodologies. 
Although substantial progress has been made in the 
development of alternative methods, bioassays are 
still used for testing endpoints such as neurotoxic- 
ity, immunotoxicity, reproductive and developmen- 
tal toxicology, and genetic toxicology. The rodent 
cancer bioassay is a particularly expensive and time- 
consuming assay, as it requires almost 4 yr, 1200 
animals, and millions of dollars to execute and ana- 
lyze [43]. In vitro experiments of the type outlined 
in Figure 2 might provide evidence that an unknown 
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Table 1. ToxChip v1.0: A Human cDNA Microarray 
ChipD sign d to Detect Responses to Toxic Insult 



No. of genes 

Gene category on chip 



Apoptosis 72 

DNA replication and repair 99 

Oxidative stress/redox homeostasis 90 

Peroxisome proliferator responsive 22 

Dioxin/PAH responsive 1 2 

Estrogen responsive 63 

Housekeeping 84 

Oncogenes and tumor suppressor genes 76 

Cell-cycle control 51 

Transcription factors 1 3 1 

Kinases 276 

Phosphatases 88 

Heat-shock proteins 23 

Receptors 349 

Cytochrome P450s 30 



*This list is intended as a general guide. The gene categories are not 
unique, and some genes are listed in multiple categories. 

agent is (or is not) responsible for eliciting a given 
biological response. This information would help to 
select a bioassay more specifically suited to the agent 
in question or perhaps suggest that a bioassay is not 
necessary, which would dramatically reduce cost, 
animal use, and time. 

The addition of microarray techniques to stan- 
dard bioassays may dramatically enhance the sen- 
sitivity and interpretability of the bioassay and 
possibly reduce its cost. Gene-expression signatures 
could be determined for various types of tissue-spe- 
cific toxicants, and new compounds could be 
screened for these characteristic signatures, provid- 
ing a rapid and sensitive in vivo test. Also, because 
gene expression is often exquisitely sensitive to low 
doses of a toxicant, the combination of gene-expres- 
sion screening and the bioassay might allow the use 
of lower toxicant doses, which are more relevant to 
human exposure levels, and the use of fewer ani- 
mals. In addition, gene-expression changes are nor- 
mally measured in hours or days, not in the months 
to years required for tumor development. Further- 
more, microarrays might be particularly useful for 
investigating the relationship between acute and 
chronic toxicity and identifying secondary effects 
of a given toxicant by studying the relationship 
between the duration of exposure to a toxicant and 
the gene-expression profile produced. Thus, a bio- 
assay that incorporates gene-expression signatures 
with traditional endpoints might be substantially 
shorter, use more realistic dose regimens, and cost 
substantially less than the current assays do. 

These considerations are also relevant for branches 
of toxicology not related to human health and not 
using rodents as model systems, such as aquatic toxi- 
cology and plant pathology. Bioassays based on the 
flathead minnow, Daphnia, and Arabadopsis could 
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also be improved by the addition of microarray analy- 
sis. The combination of microanays with traditional 
bioassays might also be useful for investigating some 
of the more intractable problems in toxicology re- 
search, such as the effeas of complex mixtures and 
the difficulties in cross-species extrapolation. 

Exposure Assessment, Environmental Monitoring, 
and Drug Safety 

The currently used methods for assessment of ex- 
posure to chemical toxicants are based on measure- 
ment of tissue toxin levels or on surrogate markers 
of toxicity, termed biomarkers (e.g., peripheral blood 
levels of hepatic enzymes or DNA adducts). Because 
gene expression is a sensitive endpoint, gene expres- 
sion as measured with microarray technology may 
be useful as a new biomarker to more precisely iden- 
tify hazards and to assess exposure. Similarly, 
microarrays could be used in an environmental- 
monitoring capacity to measure the effect of poten- 
tial contaminants on the gene-expression profiles 
of resident organisms. In an analogous fashion, 
microarrays could be used to measure gene-expres- 
sion endpoints in subjects in clinical trials. The com- 
bination of these gene-expression data and more 
established toxic endpoints in these trials could be 
used to define highly precise surrogates of safety. 

Gene-expression profiles in samples from exposed 
individuals could be compared to the profiles of the 
same individuals before exposure. From this infor- 
mation, the nature of the toxic exposure can be de- 
termined or a relative clinical safety factor estimated. 
In the future it may also be possible to estimate not 
only the nature but the dose of the toxicant for a 
given exposure, based on relative gene-expression 
levels. This general approach may be particularly 
appropriate for occupational-health applications, in 
which unexposed and exposed samples from the 
same individuals may be obtainable. For example, 
a pilot study of gene expression in peripheral-blood 
lymphocytes of Polish coke-oven workers exposed 
to PAHs (and many other compounds) is under con- 
sideration at the NIEHS. An important consideration 
for these types of studies is that gene expression can 
be affected by numerous factors, including diet, 
health, and personal habits. To reduce the effects 
of these confounding factors, it may be necessary 
to compare pools of control samples with pools of 
treated samples. In the future it may be possible to 
compare exposed sample sets to a national database 
of human-expression data, thus eliminating the 
need to provide an unexposed sample from the same 
individual. Efforts to develop such a national gene- 
expression database are currently under way [44,45]. 
However, this national database approach will re- 
quire a better understanding of genome-wide gene 
expression across the highly diverse human popu- 
lation and of the effects of environmental factors 
on this expression. 
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Alleles, Oligo Arrays, and Toxicogenetics 

Gene sequences vary between individuals, and 
this variability can be a causative factor in human 
diseases of environmental origin [46,47]. A new area 
of toxicology, termed toxicogenetics, was recently 
developed to study the relationship between genetic 
variability and toxicant susceptibility. This field is 
not the subject of this discussion, but it is worth- 
while to note that the ability of oligonucleotide ar- 
rays to discriminate DNA molecules based on single 
base-pair differences makes these arrays uniquely 
useful for this type of analysis. Recent reports dem- 
onstrated the feasibility of this approach [41,42]. 
The NIEHS has initiated the Environmental Genome 
Project to identify common sequence polymor- 
phisms in 200 genes thought to be involved in en- 
vironmental diseases [48]. In a pilot study on the 
feasibility of this application to the Environmental 
Genome Project, oligonucleotide arrays will be used 
to resequence 20 candidate genes. This toxicogenetic 
approach promises to dramatically improve our un- 
derstanding of interindividual variability in disease 
susceptibility. 

FUTURE PRIORITIES 

There are many issues that must be addressed be- 
fore the full potential of microarrays in toxicology 
research can be realized. Among these are model sys- 
tem selection, dose selection, and the temporal na- 
ture of gene expression. In other words, in which 
species, at what dose, and at what time do we look 
for toxicant-induced gene expression? If human 
samples are analyzed, how variable is global gene 
expression between individuals, before and after toxi- 
cant exposure? What are the effects of age, diet, and 
other factors on this expression? Experience, in the 
form of large data sets of toxicant exposures, will 
answer these questions. 

One of the most pressing issues for array scientists 
is the construction of a national public database 
(linked to the existing public databases) to serve as a 
repository for gene-expression data. This relational 
database must be made available for public use, and 
researchers must be encouraged to submit their ex- 
pression data so that others may view and query the 
information. Researchers at the National Institutes 
of Health have made laudable progress in develop- 
ing the first generation of such a database [44,45]. In 
addition, improved statistical methods for gene clus- 
tering and pattern recognition are needed to ana- 
lyze the data in such a public database. 

The proliferation of different platforms and meth- 
ods for microarray hybridizations will improve 
sample handling and data collection and analysis and 
reduce costs. However, the variety of microarray 
methods available will create problems of data com- 
patibility between platforms. In addition, the near- 
infinite variety of experimental condirions under 
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which data will be collected by different laborato- 
ries will make large-scale data analysis extremely dif- 
ficult. To help circumvent these future problems, a 
set of standards to be included on all platforms 
should be established. These standards would facili- 
tate data entry into the national database and serve 
as reference points for cross-platform and inter-labo- 
ratory data analysis. 

Many issues remain to be resolved, but it is clear 
that new molecular techniques such as microarray 
hybridizarion will have a dramatic impact on toxicol- 
ogy research. In the future, the information gathered 
from microarray-based hybridization experiments will 
form the basis for an improved method to assess the 
impact of chemicals on human and environmental 
health, 
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Abstract 

Recent progress in genomics and proteomics technologies has created a unique opportunity to significantly impact 
the pharmaceutical drug development processes. The perception that cells and whole organism; express specific 
inducible responses to stimuli such as drug treatment implies that unique expression patterns, molecular fingerprints, 
mdicative of a drug s efficacy and potential toxicity are accessible. The integration into state-of-the-art toxicology of 
assays allowing one to profile treatment-related changes in gene expression patterns promises new insiehts into 
mechanisms of drug action and toxicity. The benefits will be improved lead selection, and optimized monitoring of 

? /nnn Pi'^ ^ 't ^1'?^?' """^ ^"'"^ ^" biologically relevant tissue and surrogate markers. 

© 2000 Elsevier Science Ireland Ltd. All rights reserved. 
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1. Introduction 

The majority of drugs act by binding to protein 
targets, most to known proteins representing en- 
zymes, receptors and channels, resulting in effects 
such as enzyme inhibition and impairment of 
signal transduction. The treatment-induced per- 
turbations provoke feedback reactions aiming to 
compensate for the stimulus, which almost always 
are associated with signals to the nucleus, result- 
ing in altered gene expression. Such gene expres- 
sion regulations account for both the 
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pharmacological action and the toxicity of a drug 
and can be visualized by either global mRNA or 
global protein expression profiling. Hence, for 
each individual drug, a characteristic gene regula- 
tion pattern, its molecular fingerprint, exists 
which bears valuable information on its mode of 
action and its mechanism of toxicity. 

Gene expression is a multistep process that 
results in an active protein (Fig. I). There exist 
numerous regulation systems that exert control at 
and after the transcription and the translation 
step. Genomics, by definition, encompasses the 
quantitative analysis of transcripts at the mRNA 
level, while the aim of proteomics is to quantify 
gene expression further down-stream, creating a 
snapshot of gene regulation closer to ultimate cell 
function control. 
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2. Global mRNA profiling 

Expression data at the mRNA level can be 

produced using a set of different technologies 
such as DNA microarrays, reverse transcript 
imaging, amplified fragment length polymorphism 
(AFLP), serial analysis of gene expression 
(SAGE) and others. Currently, DNA microarrays 
are very popular and promise a great potential. 
. On a typical array, each gene of interest is repre- 
sented either by a long DNA fragment (200-2400 
bp) typically generated by polymerase chain reac- 
tion (PGR) and spotted on a suitable substrate 
using robotics (Schena et al., 1995; Shalon et al., 
1996) or by several short oligonucleotides (20-30 
bp) synthesized directly onto a solid support using 
photolabile nucleotide chemistry (Fodor et al.^ 
1991; Ghee et al., 1996). From control and treated 
tissues, total RNA or mRNA is isolated and 
reverse transcribed in the presence of radioactive 
or fluorescent labeled nucleotides, and the labeled 
probes are then hybridized to the arrays. The 
intensity of the array signal is measured for each 
gene transcript by either autoradiography or laser 
scanning confocal microscopy. The ratio between 
the signals of control and treated samples reflect 
the relative drug-induced change in transcript 
abundance. 



3. Global protein profiling 

Global quantitative expression analysis at the 
protem level is currently restricted to the use of 
two-dimensional gel electrophoresis. This tech- 
nique combines separation of tissue proteins by 
isoelectric focusing in the first dimension and by 
sodium dodecyl sulfate slab gel electrophoresis- 
based molecular weight separation on the second, 
orthogonal dimension (Anderson et al., 1991) 
The product is a rectangular pattern of protein 
spots that are typically revealed by Goomassie 
Blue, silver or fluorescent staining (Fig. 2). 
Protein spots are identified by mass spectrometry 
following generation of peptide mass finserprints 
(Mann et al., 1993) and sequence tags (Wilkins et 
al., 1996). Similar to the mRNA approach, the 
ratio between the optical density of spots from 
control and treated samples are compared to 
search for treatment-related changes. 



4. Expression data analysis 

Bioinformatics forms a key element required to 
organize, analyze and store expression data from 
either source, the mRNA or the protein level. The 
overall objective, once a mass of high-quality 
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quantitative expression data has been collected, is 
to visualize complex patterns of gene expression 
changes, to detect pathways and sets of genes 
tightly correlated with treatment efficacy and toxi- 
city, and to compare the effects of different sets of 
treatment (Anderson et al., 1996). As the drug 
effect database is growing, one may detect similar- 
ities and differences between the molecular finger- 
prints produced by various drugs, information 
that may be crucial to make a decision whether to 
refocus or extend the therapeutic spectrum of a 
drug candidate. 



5. Comparison of global mRNA and protein 
expression profiling 

There are several synergies and overlaps of data 
obtained by mRNA and protein expression analy- 
sis. Low abundant transcripts may not be easily 
quantified at the protein level using standard two- 
dimensional gel electrophoresis analysis and their 
detection may require prefractionation of sam- 
pies. The expression of such genes may be prefer- 
ably quantified at the mRNA level using 
techniques allowing PCR-mediated target amplifi- 
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cation. Tissue biopsy samples typically yield good 
quality of both mRNA and proteins; however, the 
quality of mRNA isolated from body fluids is 
often poor due to the faster degradation of 
mRNA when compared with proteins. RNA sam- 
ples from body fluids such as serum or urine are 
often not verv' 'meaningful', and secreted proteins 
are likely more reliable surrogate markers for 
treatment efficacy and safety. Detection of post- 
translational modifications, events often related to 
function or nonfunction of a protein, is restricted 
to protein expression analysis and rarely can be 
predicted by mRNA profiling. Information on 
subcellular localization and translocation of 
proteins has to be acquired at the level of the 
protein in combination with sample prefractiona- 
tion procedures. The growing evidence of a poor 
correlation between mRNA and protein abun- 
dance (Anderson and Seilhamer, 1997) further 
suggests that the two approaches, mRNA and 
protein profiling, are complementary and should 
be applied in parallel. 



6. Expression profiling and drug development 

Understanding the mechanisms of action and 
toxicity, and being able to monitor treatment 
efficacy and safety during trials is crucial for the 
successful development of a drug. Mechanistic 
insights are essential for the interpretation of drug 
effects and enhance the chances of recognizing 
potential species specificities contributing to an 
improved risk profile in humans (Richardson et 
al., 1993; Steiner et al., 1996b; Aicher et ah, 1998). 
The value of expression profiling further increases 
when links between treatment-induced expression 
profiles and specific pharmacological and toxic 
endpoints are established (Anderson et al., 1991, 
1995, 1996; Steiner et ah 1996a). Changes in gene 
expression are known to precede the manifesta- 
tion of morphological alterations, giving expres- 
sion profiling a great potential for early 
compound screening, enabling one to select drug 
candidates with wide therapeutic windows 
reflected by molecular fingerprints indicative of 
high pharmacological potency and low toxicity 
(Arce et al., 1998). In later phases of drug devel- 



opment, surrogate markers of treatment efficacy 
and toxicity can be applied to optimize the moni- 
tonng of pre-clinical and clinical studies (Dohertv 
et ah, 1998). 



7. Perspectives 

The basic methodology of safety evaluation has 
changed little during the past decades. Toxicity in 
laboratory animals has been evaluated primarily 
by using hematological, clinical chemistry and 
histological parameters as indicators of organ 
damage. The rapid progress in genomics and pro- 
teomics technologies creates a unique opportunity 
to dramatically improve the predictive power of 
safety assessment and to accelerate the drug devel- 
opment process. Application of gene and protein 
expression profiling promises to improve lead se- 
lection, resulting in the development of drug can- 
didates with higher efficacy and lower toxicity. 
The identification of biologically relevant surro- 
gate markers correlated with treatment efficacy 
and safety bears a great potential to optimize the 
monitoring of pre-clinical and clinical trails. 
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DNA array technology^ makes it possible to lapidty genotype individuals or quandfy the cKpression 
of thousands of genes on a single filter or glass slide, and holds , enormous potential in toxicologic 
applications. This potential led to a U.S. Environmental Protection Agency-sponsored workshop 
tided '"Application of Microarrays to Toxicology^ on 7-8 January 1999 in ResearcK Triangle Park, 
North Carolina. In addidon to providing state-of-the-art information on the application of DNA or 
gene microarrays, the workshop catalyzed the formation of several collaborations,, committees^ and 
user's groups throughout the Research Triangle Park area, and beyond. Potential application: of 
microarrays to toxicologic research and dsk assessment include genome^wide expression analyses to 
identify gene-e3q)ression networks and toxicant-spedfic signatures that can be used to define mode 
of action, for exposure assessment, and for environmental monitoring. Arrays may also prove useful 
for monitoring genetic variability and its relationship to toxicant susceptibility in human popula- 
tions. Key wordr, DNA arrays, gene arrays, microarrays, toxicology. Environ Health Perspect 
107:681-685 (1999). [Online 6 July 1999] 
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Decoding die genetic blueprint is a dream that 
oflFcrs manifold rfetums in terms of understandr 
ing how organisms develop and fimcdon in an 
often hostile environment. With the rapid 
advances in molecular biology over the last 30 
years, the dream has come a step closer to reali- 
ty. Molecular biologists now have the ability to 
elucidate the composition of any genome. 
Indeed, ahnost 20 genomes have alneidy been 
sequenced and more than 60 are currendy 
under way. Foremost among these is the 
Human Genome Mappii^ ProjecL However, 
the genomes of a number of commordy used 
laboratory species are also under intensive 
investigation, including yeast, Arabidopsisy 
maize, rice, zebra fish, mouse, rat, and dog. It 
is widely expeaed that the completion of such 
programs will facilitate the development of 
many powerful new techniques and approach- 
es to diagnosing and treating genedcalty and 
cnviioiunencaUy induced disuses which ai&ia 
mankind. However, the vast amount of data 
being generated by genome mapping will 
require new hig^-throughput technologies to 
invesdgate the Rmcdon of the millions of new 
gpnes that arc being reported- Amoi^ the most 
widely heralded of the new functional 
genomics technologies arc DNA arrays, which 
represent perhaps the most anticipated new 
molecular biology technique since polymerase 
chain reacnon (PGR), 

Arrays enable the study of literally thou- 
sands of genes in a single experiment. The 
potential importance of arrays is enormous and 
has been hi^ilighted by the recent publication 
of an entire Nature Genetics supplement dedi- 
cated to the technology (/). Despite this huge 
surge of interest, DNA arrays are still litde used 
and largely unproven, as demonstrated by the 
high ratio of review and press articles to actual 
data papers. Even so, the. potential they offer 



has driven venture capitalists into a frenzy of 
investment and many new companies arc 
springing up to claim a share of this rapidly 
developing markec 

The U.S. Environmental Protection 
Agency (EPA) is interested in applying DNA 
array technology to ongoing toxicologic stud- 
ies. To learn more about the current state of 
the technology, the Reproducth^ Toxicology 
Division (RTD) .of the National Heaftii and 
Environmental Effects Research Laboratory 
(NHEERL; Research Triangle Park, NC) 
hosted a workshop on "Application of 
Microarrays to Toxicology" on 7-8 January 
1999 in Research Triangle Park, North 
Carolina. The workshop was o^ga^izcd by 
David Dix, Robert Ka^ock, and John Rockett 
of die RTD/NHEERL. Twenty-two intra- 
mural and extramural scientists &om govern- 
ment, academia, and industry shared informa- 
tion, data, and opinions on the current and 
future applications for this exciting new tech- 
nology. The workshop had more than 1 50 
attendees, including researchers, students, and 
administrators from the EPA, the National 
Institute of Environmental Health Sciences 
(NIEHS), and a number of other establish- 
ments from Research Triangle Park and 
beyond. Presentations ranged from the tech- 
nology behind array production through the 
sharing of actual cxperimerical data and projec- 
tions on the future importance and applica- 
tions of arrays. The information contained in 
the workshop presentations should provide aid 
and insight into arrays in general and their 
application to toxicology in particular. 

Array El ments 

In the context of molecular biology, the word 
"array" is normally used to refer to a series of 
DNA or protein elements firmly attached in 



a regular pattern to some kind of supportive 
medium. DNA array is often used inter- 
changeably with gene array or microarray. 
Although not formally defined, microarray is 
generally used to describe the higher density 
arrays typically printed on glass chips. The 
DNA elements that make up DNA arrays 
can be oligonucleotides, partial gene 
sequences, or fiill-length cDNAs. Companies 
offering p re-made arrays that contain less 
than fidl-length clones normally use regions 
of the genes which are specific to that gene to 
prevent false positives arising through cross- 
hybridization. Sequence verification of 
cDNA done identity is necessary because of 
errors in identifying specific clones from 
cDNA libraries and databases. Premade 
DNA arrays printed on membranes arc cur- 
rently or imminently available for human, 
mouse, and rat. In most cases they contain 
DNA sequences representing several thou- 
sand different sequence clusters or genes as 
delineated throu^ the National Center for 
Biotechnology Information UniGcnc Project 
{2), Many of these diflferent UniGene dusters 
(putative genes) are represented only by 
expressed sequence tags (ESTs). 

Array Printing 

Arrays are typically printed on one of two 
types of support matrix. Nylon membranes 
are used by most off-the-shelf array providers 
such as Clontech Laboratories, Inc. 
(Palo Alto, CA), Genome Systems, Inc. (St. 
Louis, MO), and Research Genetics, Inc. 
(Huntsville, AL). Microarrays sudi as those 
produced by Affymetrix, Inc. (Santa Clara, 
CA), Incyte Pharmaceuticals, Inc (Palo Alto, 
CIA), and many do-it-yourself (DIY) arraying 
groups use glass wafers or slides. Although 
standard microscope sUdes may be used, they 
must be preprepared to facilitate sticking 
of the DNA to the glass. Several different 
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coatings have been successfully used, includ- 
ing silane and lysine. The coating of slides 
can easily be carried out in the laboratory, 
but many prefer the convenience of precoatcd 
slides available from suppliers. 

Once the support matrix has been pre- 
pared, the DNA dements can be appUcd by 
several methods. Affymetrix, Inc., has devel- 
oped a unique photolithographic technology 
for attaching o%)nucleotides to glass wafers. 
More commonly, DNA is applied by either 
noncontaa or contaa printing. Noncontaa 
printers can use thermal, solenoid, or piezoelec- 
tric technology to spray aliquots of solution 
onto the support matrix and may be used to 
produce slide or membrane-based arrays. 
Cancsian Technologies, Inc. (Irvine, CA) has 
developed nQUAD technology for use in its 
PixSys printers. The system couples a syringe 
pump with the microsolenoid valve, a combi- 
nation that provides rapid quandtadvc dispens- 
ing of nanoHtcr volumes (down to 4.2 nL) over 
a variable volume range. A different approach 
to noncontaa prindng uses a solid pin and ring 
combination (Genetic MicroSystcrris, Inc., 
Wobum, MA). This system (Figure 1) allows a 
broader rat^ of sample, including cell suspen- 
sions and particulates, because the printing 
head cannot be blocked up in the same way as 
a spray nozzle. Fluid transfer is controlled in 
this system primarily by the pin dimensions 
and the force of deposition, although the 
nature of the support matrix and the sample 
will also affca transfer to some d^;rec. 

In contact printing, the pin head is dipped 
in the sample and then toudied to the support 
matrix to deposit a small aliquot. Split pins 
were one of the first contaa-printing devices 
to be reponed and arc the suggested format 
for DIY arraycrs, as described by Brown (3). 
Split pins are small metal pins with a precise 
groove cut vertically in die middle of the pin 
dp. In this system, 1*48 split pins arc posi- 
Qoned in die pin-head. The split pins work by 
simple capillary acdon, not unlike a fountain 
pen — ^^en the pin heads are dipped in the 
sample, liquid is drawn into the pin groove. A 
small (fixed) volume is then deposited each 
time the split pins arc gently touched to 
the suppon matrix. Sample (100-500 pL 
depending on a variety of parameters) can be 
deposited on muldplc slid^ before refilling is 
required, and array densities of > 2,500 
spots/cm^ may be produced. The deposit vol- 
ume depends on the split size, sample fluidi- 
ty, and the speed of printing. Split pins are 
relatively simple to produce and can be made 
in-house if a suitable machine shop is avail- 
able. Alternatively, they can be obtained 
direcdy from compaiues such as TclcChem 
International, Inc (Sunnyvale, CA). 

Irrespective of their source, printers 
should be run through a preprint sequence 
prior to producing the actual experimental 



arrays; the first 100 or so spots of a new run 
tend to be somewhat variable. Faaors effca- 
ing spot reproducibility include slide treat- 
ment homogeneity, sample differences, and 
instrument errors. Other faaors that come 
into play include clean ejection of the drop 
and clogging (nQUAD printing) and 
mechanicil variations and long- term alter- 
ation in print-head sur&ce of solid and split 
pins. However, wth careful preparation it is 
possible to get a coefficient of variance for 
spot reproducibility below 1 0%. 

One potential printing problem is sample 
carryover. Repeated washing, blotting, and 
drying (vacuum) of print pins between samples 
is normally eflfeoive at reducing sample carry^ 
over to negligible amounts. Printing should 
also be carried out in a controlled environ- 
ment. Humidified chambers are available in 
which to place printers. These help prevent 
dust contamination and produce a uniform 
drying rate, which is important in determining 
spot size, quality, and reproducibility. 

In summary, although several printing 
technologies are available, none arc par- 
ticularly outstanding and the bottom line 
is that they are still in a relatively early stage 
of evolution. 

Array Hybridization 

The hybridization protocol is, practically 
speaking, relatively straightforward and those 
with previous experience in blotting should 
have little difficulty. Array hybridizations 
are, in essence, reverse Southern/Northern 
blots — 'instead of applying a labeled probe to 
the target population of DNA/RNA, the 
labeled population is q>plicd to the probe(s). 
With membrane-based arrays,, the control and 
treated mRNA populations are normally con- 
verted to cDNA and labeled with isotope (e.g., 
^^P) in the process. These labeled populations 
are then hybridized independendy to parallel 
or serial arrays and the hybridization signal is 
deteaed with a phosporimager. A less com- 
monly used altematrvc to radioactive probes is 
enzymatic detection. The probe may be 
biotinylated, haptenylated, or have alkaline 
phosphatase/horseradish peroxidase attached. 
Hybridization is detected by enzymatic reac- 
tion yielding a color reaction {4). Differences 
in hybridization signals can be detected by eye 
or, more accurately, with the help of digiol 
imagir^ and commercially available software. 
The labeling of the test populations for slide- 
based microarrays uses a slightly different 
approach. The probe typically consists of two 
samples of polyA* RNA (usually from a treated 
and a control population) that are converted to 
cDNA; in the process each is labeled with a 
different fluor. The independently labeled 
probes are then nuxcd together and hybridized 
to a single microarray slide and the resulting 
combined fluorescent signal is scanned. After 




sample soli^n 




Figure 1. Genetic Microsystems (Wobum, MA) pin 
ring system for printing errays. The pin ring com- 
bination consists of a circular open ring oriented 
parallel to the sample solution, with a vertical pin 
centered over the ring. When the ring is dipped 
into a solution and lifted, rt vyithdraws an aliquot 
of sample held by surface tension. To spot the 
sample, the pin is driven down through the ring 
and 8 portion of the solution Is transferred to the 
bottom of the pin. The pin continues to move 
downward until the pendant drop of solution 
makes contact with the undertying surface. The 
pin is then lifted, and gravity and surface tension 
cause deposition of the spot onto the array. 
Figure from Rowers et el. (74), with permission 
from Genetic Microsystems. 

normalization, it is possible to determine the 
ratio of fluorescent signals From a single 
hybndization of a slide-based microarray, 

cDNA derived from control and treated 
populations of RNA is most commonly 
hybridized to arrays, although subtractive 
hybridization or differential display reactions 
may also be used. Fluorophore- or radiola- 
beled nucleotides are direcdy incorporated 
into the cDNA in the process of coirverting 
RNA to cDNA Alternatively, 5' end-labeled 
primers may be used for cDNA synthesis. 
These are labeled with a fluorophore for 
dirca visualization of the hybridized array. 
Alternatively, biotin or a hapten may be 
attached to the primer, in which case fluor- 
labeled streptavidin or antibody must be 
applied before a signal can be generated. The 
most opnunonly used fiuorophores at present 
are cyanine (Cy)3 and Cy5 (Amersham 
Pharmacia Biotech AB, Uppsala, Sweden). 
However, the relative expense of these fluo- 
rescent conjugates has driven a search for 
cheaper alternatives. Fluorescein, rhodamine, 
and Texas red have all been used, and 
companies such as Molecular Probes, Inc. 
(Eugene, OR) are developing a series of 
labeled nucleotides with a wide range of exci- 
tation and emission spectra which may prove 
to function as well as the Cy dyes. 
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Tabid, Advantages and disadvantages of different mlcroarray scanning systems. 



* Nonconfocat laser scanner 


Advantages 
Disadvantages 


hw moving parts 

Fast scanning of bright 
samples 

Less appropriate for dim 
samples 

Optical scatter can limit 
performance 


Relatively simple optics 

Low light collection efficiency 
Background artifacts not rejected 
Resolution typically low 


Small depth of focus reduces 
artifacts 

May have high light collection 
efficiency 

Small depth of focus requires 
scanning precision 



Analysis of ONA Microarrays 

Membrane-based arrays are nonmlly analyzed 
on film or with a phosphorimager, whereas 
chip-based anays require more specialized scan- 
ning devices. These can be divided into three 
main groups: the charge-coupled device camera 
systems, the nonoonfocal laser scanners, and the 
confixal laser scaimers. The advantages and dis- 
advantages of each system are listed in Table 1. 

Because a typical spot on a microarray can 
contain > 10^ molecules, it is clear that a large 
variation in signal strength may occur. 
Current scanners cannot work across this 
many orders of magnitude (4 or 5 is more typ- 
ical). However, the scanning parameters can 
normally be adjusted to collea more or less 
signal, such that two or three scans of the same 
array should permit the detection of rare and 
abundant genes. 

When a microarray is scanned, the fluores- 
cent images are captured by software normally 
included with the scarmer. Several commercial 
suppliers provide additional software for quan- 
tifying array images, but the software tools are 
constandy evolving to meet the developing 
needs of researchers, and it is prudent to 
define one's own needs and clarify the cxaa 
capabilities of the softvvare before its purchase. 
Issues that should be considered mdude the 
following: 

• Can the softwate locate offset spots? 

• Can it quandtatc across irr^ylar hybridiza- 
tion signals? 

• Can the arrayed genes be programmed in for 
easy identification and location? 

• Can the software coimea via the Internet to 
databases containing further information on 
the genc(s) of interest? 

One of the key issues raised at the work- 
shop was the sensitivity of microarray technol- 
ogy. Experiments by General Scaiming, Inc. 
(Watertown, MA), have shown that by using 
the Cy dyes and their scaimer, signal can be 
detected down to leveb of < 1 fluor molecule 
per square micrometer, which translates to 
detecting a rare message at approximatdy one 
copy per cell or less. 

Array Applications 

Although arrays are an emerging technology 
certain to undergo improvement and 
alteration,* they have already been applied use- 
fully to a number of model systems. Arrays arc 
at didr most powerfiil when they contain the 
entire genome of the species they are being 
used to study. For this reason, they have strong 
support among researchers utilizing yeast and 
Camorhabditis eUgans (5). The genomes of 
both of these species have been sequenced and, 
in die case of yeast, deposited onto arrays for 
examination of gene expression {6,7). With 
both of these species, it is relatively easy to 
perturb individual gene expression. Indeed, C 



ceo, charge-coupled device. 
From Kawasaki ( ;3). 

eUgans knockouts can be made simply by 
soaking the worms in an antisense solution of 
the gene to be knocked out. 

By a process of systematic gene disrup- 
tion, it is now possible to examirie the cause 
and effect relationships between different 
genes in these simple organisms. This kind of 
approach should help elucidate biochemical 
pathways and genetic control processes, 
deconvolute polygenic interactions, and 
define the architecture of the cellular network. 
A simple case study of how this can be 
achieved was presented by Butow [University 
of Texas Southwestern Medical Center, 
Dallas, TX (Figure 2)]. Although it is the 
phcnotypic result of a single gene knockout 
that is being examined, the effect of such 
perturbation will almost ahvays be polygenic 
Polygenic interactions will become inaeasing- 
ly important as researchers begin to move" 
away from single gene systems when examin- 
ing the nature of toxicologic responses to 
external stimuli. This is especiaUy important 
in toxicology because the phenotype pro- 
duced by a given environmental insult is 
never the restdt of the action of a single gene; 
rather, it is a complex interaction of one or 
multiple cellular pathways. Phenomena such 
as quantitative trait (the continuous variation 
of phenotype), epistasis (the effca of alleles of 
one or more genes on the expression of other 
genes), and penetrance (proportion of indi- 
viduals of a given genotype that display a par- 
ticular phenotype) will become increasingly 
evident and important as toxicologists push 
toward the ultimate goal of matching the 
responses of individuals to different 
envirormiental stimuli. 

Analysis of the transcriptome (the expres- 
sion level of all the genes in a given cell popula- 
tion) was a use of arrays addressed by several 
speakers. Unfortunately, current gene nomen- 
clature is often confusing in diat single genes 
are allocated multiple names (usually as a result 
of independent discovery by different kborato- 
ries), and there was a call for standardization of 
gene nomenclature. Nevertheless, oiice a tran- 
scriptome has been assembled it can then be 
transferred onto arrays and used to screen any 
chosen system. The EPA MicroArray 
Consortium (EPAMAQ is assembling testes 



transcnptomes for human, rat, and mouse. In a 
slighdy different approach, Nuwaysir et al. (fi) 
describes how the NIEHS assembled what is 
effectively a "toxicological transcriptome" — a 
library of human and mouse genes that have 
previously been proven or implicated in 
responses to toxicologic insults. Clontech 
Laboratories, Inc (Palo Alto, CA), has begun a 
similar process by developii^ stress/toxicology 
filter arrays of rat, mouse, and human genes. 
Thus, rather than being tissue or cell specific, 
these stress/toxicology arrays can be used across 
a variety of model systems to look for alter- 
ations in the expression of toxicologically 
important genes and define the new field of 
toxicogcnomia. The potential to identify toxi- 
cant fanulies based on tissue- or cell-specific 
gene cjq)tession could revolutionize driig test- 
ing. These molecular signatures or fingerprints 
could not only point to the possible 
toxicity/carcinogenicity of newly discovered 
compounds (Figure 3), but also aid in elucidat- 
ing dieir medianism of action through identifi- 
cation of gene c3q)rcssion networks. By exten- 
sion, such signatures could provide easily iden- 
tifiable biomarkers to assess the d^ee, time, 
and nature of exposure. 

DNA arrays are primarily a tool for cxam- 
ining differential gene expression in a given 
model. In this context they are referred to as 
dosed systems because they lack the ability of 
other differential expression technologies, e.g., 
differential display and subtractive hybridiza- 
tion, to detea previously unknown genes not 
present on the array. This would appear to 
limit the power of DNA arrays to the imagina- 
tions and preconceptions of the researcher in 
selecting genes previously characterized and 
thought to be involved in the model system. 
However, the various genome sequencing pro- 
jects have created a new category of 
sequence— the EST — that has partiaUy molli- 
fied this deficiency. ESTs are cDNAis expressed 
in a given tissue that, although they may share 
some degree of sequence similarity to previous- 
ly charaaerized genes, have not been assigned 
specific genetic identity. By incorporating EST 
dones into an array, it is possible to morutor 
the expression of diesc uriknown genes. This 
can enable the identification of previously 
uncharaaerized genes that may have biologic 
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significance in the model system. Filter arrays 
from Research Genetics and slide arrays from 
Incyte Pharmaceuticals both incorporate large 
numbers of ESTs from a variety of species. 

A further use of microanays is die identifi- 
cation of single nucleotide polymorphisms 
(SNPs). These genomic variations are abun- 
dant — they occur approximately every 1 kb or 
so — ^and arc the basis of restriaion fragment 
length polymorphism analysis used in Forensic 
analysis. Afiymetrix, Inc., designed chips that 
contain multiple repeats of the same gene 
sequence. Each position is present with all four 
possible bases. Afrer the hybridizauon of the 
sample, the d^;ree of hybridizadon to the dif- 
ferent sequences can be measured and the cxaa 
sequence of the target gene deduced. SNPs are 
thought to be of vital importance in drug 
metabolism and toxicology. For example, sin- 
gle base differences in the regulatory region or 
active site of some genes can account for huge 
diflFerenccs. in the activity of that gene. Such 
SNPs arc thoi^t to C3q>lain why some people 
are able to mecabolizc certain xenobioucs bet- 
ter than others. Thus, arr^ provide a further 
tool for the toxicologist investigating the 
nature of susceptible subpopulations and toxi- 
cologic response 

There are soil many vmnkles to be ironed 
out before arrays become a standard tool for 
toxicologists. The main issues raised at the 
wodcshop by those with hands-on experience 
were the following: 

* Expense: the cost of purchasing/contracting 
this technology is still too great for many 
individual laboratories. 




Rgure 2. Potential effects of gene knockout within 
positively and negatively regulated gene expression 
networks. is limiting in wild type for expression of 
i^. {A) A simple, two-component linear regulatory 
network operating on gene i^, where /, is a positive 
effector of and y„ is either a positive or negative 
effector of ly This network could be deduced by 
examining the consequence of (6) deleting on the 
expression of and where the expression of 
would be decreased or increased depending on 
whether was a positive or negative regulator. 
These and other connected components of even 
greater complexity could be revealed by genome- 
wide expression analysis. From Butow 1 r5). 



* Goncs: the logistics of identifying, obtaining, 
and maintaining a set of nomedundant, non- 
Goncaminared, sequence-verified, species/cell/ 
tissuc/field-specific clones. 

' Use of inbred strains: where whole-organism 
models are being used, the use of inbred 
strains is important to reduce the potentially 
confusing effects of the individual variation 
typically seen in outbred populations. 

• Probe: the need for relatively large amounts 
of RNA, which limits the type of sample 
(e.g., biopsy) that can be used. Also, different 
RNA extraction methods can give different 
results. 

t Specificity: the ability to discrimituite accu- 
; rately between closely related genes (eg., the 
[ cytochrome p450 family) and splice variants, 
t Quantitation: the quantitation of gene 
I expression using gene arrays is still open to 
debate. One reason for this is the different 
incorporation of the labeling dyes. However, 
the main difficulty lies in knowing what to 
normalize againsL One option is to include a 
large number of so-called housekeeping genes 
in the array. However, the expression of these 
genes often change depending on the tissue 
and the toxicant, so it is necessary to charac- 
terize the expression of these genes in the 
model system before utilizing them. This is 
dearly not a viable option when screening 
multiple new compounds. A second option 
is to include on the array genes from a nonre- 
lated species (e.g., a plant gene on an animal 
array) and to spike the probe with synthetic 
RNA(s) compkmentaiy to the gene(s). 
■ Reproducibility: this is sometimes question- 
able, and a figure of approximately two or 
dircc repeats was used as the minimum num- 
ber required to confirm initial findings. 



Again, however, most people advocated the 
use of Northern blots or reverse transcriptase 
PGR to confirm finding^. 

• Sensitivity: conccms were voiced about the 
number of tai^ molecules that must be pre- 
sent in a sample for them to be deteaed on 
the array. 

• Efficiency: reproducible identification of 1.5- 
to 2-fbld differences in expression was report- 
ed, although the number of genes that 
undergo this level of change and remain 
undeteaed is open to debate. It is important 
that this level of detection be ultimately 
achieved because it is conunonly perceived 
that some important transcription factors 
and their r^;uiacors respond at such low lev- 
els. In most cases, 3- to 5-fbld was the mini- 
mum change that most were happy to 
accept. 

• Bioinfbrmatics: perhaps the greatest concern 
was how to accurately interpret the data with 
the greatest accuracy and efficiency. The 
biggest headache is trying to identify net- 
works of gene expression that arc common to 
different treatments or doses. The amount of 
data from a single experiment is huge. It may 
be that, in the fiiture, several groups individ- 
ually equipped with specialized software algo- 
rithms for studying their favorite genes or 

. gene systems will be able to share the same 
hybridized chips. Thus, arrays could usher in 
a new perspeahre on collaboration and the 
sharing of data. 

EPAMAC 

Perhaps the main reason most scientists are 
unable to use array technology is the high cost 
involved, whether buying off-the-shelf mem- 
branes, using contract printing services, or 
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Figure 3. Gene expression profiles— also catted fingerprints or signatures — of known toxicants or toxi- 
cant families may, in the future, be used to identify the potential toxicity of new drugs, etc. In ttiis exam- 
ple, the genetic signature of test compound 1 is identical to that of known peroxisome prolrferators, 
wtiereas that of test compound 2 does not match any known toxicant family. Based on these results, test 
cpmpound 2 would be retained for further testing and test compound 1 would be eliminated. 
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producing chips in-housc. In view of this, 
researchers at the RTD/NHEERL initiated 
the EPAMAC. This consortium brings 
together scientists from the EPA and a num- 
ber of extramural labs with the aim of devel- 
oping micioanay capability through the shar- 
ing of resources and data. EPAMAC 
researchers are primarily interested in the 
developmental and toxicologic changes seen 
in testicular and breast tissue, and a portion 
of the workshop was set aside for EPAMAC 
members to share their ideas on how the 
experimental application of microarrays could 
feciliiaie their research. One of the central 
areas of interest to EPAMAC members is the 
effect of xenobiotics on male fertility and 
reproductive health. Of greatest concern is 
the effect of exposure during critical periods 
of development and germ cell dlfferentiarion 
{9ir and how this may compromise sperm 
counts and quality following sexual matura- 
tion (7^. As well as spermatogcnic tissue, 
there is also interest in how residual mRNA 
.found in mature sperm {21) could be used as 
an indicator of previous xenobioric effects (it 
is easier to obtain a semen sample than a tes- 
ticular biopsy). Arrays will be used to examine 
and compare the cffca of exposure to heat 
and chemicals in testicular and epididymal 
gene expression profiles, with the aim of 
establishing relationships/associations 
between changes in developmental landmarks 
and the effects on sperm count and quality. 
Cluster, pattern, and other analysis of such 
data should help identify hidden relationships 
between genes that may reveal potential 
mechanisms of action and uncover roles for 
genes with unknown functions. 

Summary 

The full impaa of DNA arrays may not be 
seen for sevaal years, but the mterest shown at 
this r^onal workshop indicates the high level 
of interest that they foster. Apart &om educat- 
ing and advertising the various technologies in 
this field, this workshop brought together a 
number of researchers from the Research 
Triangle Park area who arc already using DNA 
arrays. The interest in sharing ideas and C3q>cri- 
enccs led to the initiation of a Triangle array 
user s group. 
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^xray technology^ is still in its infancy. This 
meaiis that the hardware is still improving and 
therd is no current consensus for standard pro- 
cedures, quantitation, and interpretation. 
Consistency in spotting and scanning arrays is 
not yet optimized, and this is one of the most 
criu(^ requirements of any experiment. In 
additjion, one of the dark r^ons of array tech- 
noloty — strife in the courts over who owns 
whaqportions of it — ^has further muddled the 
future and is a potential barrier toward the 
development of consensus procedures. 

Perhaps the greatest hurdle for the applica- 
tion of arrays is the actual interpretation of 
data. No specialists in bioinformatics attended 
the M^rkshop, largely because they are rare and 
because as yet no one seems clear on the best 
method of approaching data analysis and inter- 
prec^ion. Cross-referencing results from mul- 
tiple iexperiments (time, dose, repeats, different 
aninjals, difierent spedes) to identify common- 
ly es^ressed genes is a great challenge. In most 
casesi we are still a long way from undetstand- 
ing liow the "expression of gene X \s related to 
the Expression of gene K and ordering gene 
eqjrissron to delineate causal relationships. 

To the ordinary scientist in the typical lab- 
oratory, however, the most immediate prob- 
lem IS a lack of affordable instrumenration. 
One! can purchase premade membranes at 
relatively affordable prices. Although these 
may I be useful in identifying individual genes 
to pikrsue in more detail using other methods, 
the ijumbers that would be required for even a 
small routine toxicology experiment prohibit 
this as a truly viable approach. For the toxicol- 
ogistt, there is a need to carry out multiple 
expetriments — -dose responses, time curves, 
miJjiple animals, and repeats. Glass-based 
DNA. arrays are most attractive in dxis context 
because they can be prepared in large batches 
from the same DNA source and accommo- 
date control and treated samples on the same 
chipj Anodier problem with current oflF-the- 
arrays is diat they often do not contain 
one pr more of the particular genes a group is 
interested in: One alternative is to obtain 
r produce a set of custom clones and 
contraa printir^ of membranes or slides 
out by a company such as Genomic 
Solutions, Inc (Ann Arbor, MI). This approach 
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is less expensive than laying out capital for 
one's own entire system, although at some 
point it might make economic sense to print 
one's own arrays. 

Finally, DNA arrays are currently a team 
eflfort. They arc a technology that uses a wide 
range of skills including engineering, statistics, 
molecular biology, chemistry, and bioinfor- 
matics. Because most individuals are skilled in 
only one or perhaps two of these areas, it 
appears that success with arrays may be best 
expeaed by teams of collaborators consisting 
of individuals having each of these skills. 

Those considering array applications may 
be amused or goaded on by the following 
quote from /brtwnf magazine (12): 

Microprocessors have reshaped our economy, . 
spawned vast fortunes and changed die way we IWe. 
Gene chips could be even bigger. 

Although this comment may haye been 
designed to excite the imagination rather than 
accurately reflea the truth, it is fair to say that 
the age of functional genomics is upon us. 
DNA arrays look set to be an important tool in 
this new age of biotechnology and will; likely 
contribute answers to some of toxicology's 
most fimdamerital questions- 

References AND Notes 

1. The chipping forecast Nat Genet 21($uppl (1999K 

2. National Center for Biotechnology Information. The 
Unigene System. Available: wvvw.ncbi.nlm.nih.gov/ 
SchulerAJniGene [cited 22 March 1999}. 

3. Brown PC. The Brown Lab. Available: http:// 
cmgm.Stanford.edu/pbrown [cited 22 March 1999). 

4. Chen JJ, Wu R, Yang Pa Huang JY, Sher YP, Han MH, 
Kao WC, Lee PJ, Chiu TF, Chang F, et al. Profiling expres- 
sion patterns and isolating differemiatly expressed genes 
by cONA microarray system with colorimetry detection. 
Genomics 5U13-324 (1998). 

5. Ward S. DNA Microarray Technology to Identify Genes 
Controlling Spermatogenesis. Available: www.mcb. 
arizona.edu/Wardtab/microarray.html [cited 22 March 1999]. 

6. Marton MJ, OeRisi Jl, Bennett HA. Iyer VR, Meyer MR, 
Roberts CJ, Stoughton R, Burchard J, Slade D, Dat H, et 
al. Drug target validation and identification of secondary 
drug target effects using DNA microarrays. Nat Med 
4:1293-1301(1998). 

7. Brown PO. The Pull Yeast Genome on a Chip. Available: 
http;//cmgm.stanford.edu/pbrown/veastchip.html [cited 
22 March 1999]. 

8. Nuwaystr EF, Bitmer M, Trent J, Barrett JC, Afshari CA. 
Microarrays and toxicology: the advent of toxtcoge- 
nomics. Mol Carcinog 24(3):153-159 (1999). 

9. Hecht NB. Molecular mechanisms of male germ eel) dif- 
ferentiation. Bioessays 20:55&-561 (1998). 

1(X Zacharewsici TR. Timothy R. Zacharewski. Available: 
www.bchjnsu.edu/racully/iEachar.htm [cited 22 March 1999]. 

11. Kramer JA, Krawetz SA. RNA in spermatozoa: implica- 
tions for the alternative haplotd genome. Mot Hum 
Reprod 3:473-478 (1997). 

12. Stipp D. Gene chip breakthrough. Fortune, March 
31:56-73(1997). 

13. Kawasat(i E (General Scanning Instruments, Inc., 
Watertown, MA). Unpublished data. 

14. Flowers P, Overbeck J, Mace ML Jr. Pagliughi FM. 
Eggers WJE, Yonkers H, Honkensn P, Montagu J, Rose 
SO. Development and Performance of a Novel 
Microerraying System Based on Surface Tension 
Forces. Available: http://www.geneticmlcro.com/ 
resources/html/coldsprfng.htmf [cited 22 March 1999f. 

15. Butow R (University of Texas Medical Center, Dallas, TX). 
UnpubTished data. 



Environmental Health Perspectives • Volume 107. Number 8. August 1999 



685 



|FwJ Tii»i>.-nl«f \ Clupl 



Docket No,: PC-0044 CIP 
US5N: 09/895,686 



- . - _^ — . — . I US5N: 09/895.68 

Subject: RE: [Fwd: Toxicolog)* Chip] | iw.No .5.of6. 

Dale: Mon. 3 Jul 2000 08:09:45 -O4O0 
From: "Afshari.CvTiihia" <af5hari@'niehs.nih.£o\> 
To: "'Diana Hamlet-Cox'" cdianahc@incyic.com> 

You car. see the lisz of clones zhaz we have or. our --•3 
' '-f g - ■ r.i el^s . ni h . zc- saps cues : • c 1 sr.e s r=:- . * *- * 



^ ■ — .g c.^ . . -3-.. saps cues: • c "-es — 

»eieczec a suiser of genes «2o6oK) ihai'we'bi*' — 
response and basic cellular processes and addedrn- e'' 
-n;s. We have included a sez of cont-o^ ocnes iBofT'-S:-"""*^-^' 

« >-«ve found =ha-. some of Ihese oe^et"-'--. 
s.gn-.can-..y after cox treazmenzs and are in the p-ocess e- f^X 
variation of each of these 80. genes across our e^er^^-i 
Ou^ chips are constantly changing and being updated and w'e ^ooe 

" -^^^ coxchip should «allv be 
_ none r>^? e «--*ttutavB <^mi*v . - * 



:nat 



hope this answers your question 
Cindy Afshari 



> from; Diana Hani ez -Cox 

> Senz: Monday. June 26, 2000 B:S2 PM 

> To: afsharieniehs.nih.gov 

> Subject: /fWd; ToxicoJogy Chip; 

> Dear Dr. Afsharz, 
> 

> Since r have noc ye: had a response '^om Bin r-v«« ^ ^ 

> the right person co contact. Perhaps he i^as no: 
> 

> Can' you help me in this jnaccer-? i don't need ro irn«u, -k 

> Diana Hainlez-Cox 
> 

> Original Message 

> Subjeci; Toxicology Clzip 

> Daze: Mon. 29 Jun 2000 18:31:48 -0700 

> From: Diana Hamlet-Cox <dianaheflincyte. com> 

> Organiration: Incyte Pharinaceucicals 

> To: griggBniehs.nih.gov 
> 

> Dear Colleague: 

> Thank you for your assistance £ti nhi^ 



> This email m ssmgm ior zh^ mol use of zhe ir.zertded reripie-r s sr.: 

> may conzair. csr.fidezzial and privileged izfor^zior. suojecr 

> azzomey-cliar.z pTivilege. Ary una-'rijcrirac zevie^\ us . disclos::re 

> diszribuzior. is prohibized. If you are noz zhe inzended rerzpier.z. 

> please conrarr zhe sender by reply esail arte deszroy all crpies ef zhe 

> crigir^al 

> 
> 



07/31/3000 10:34 AM 



research focus 



Docket No.: PC-0044 CIV 
USSN: 09/895,686 
Ref.No._6 of_6 

REVIEWS 

Proteomics: a major new 
technology for the drug 
discovery process 

Martin J. Page, Bob Amess, Christian Rohlff, Colin Stubberfield 
and Raj Parekh 



Proteomics is a new enabling technology that is being 
integrated into the drug discovery process. This will 
facilitate the systennatic analysis of proteins across any 
biological systenn or disease, forwarding new targets 
and information on mode of action, toxicology and sur- 
rogate markers. Proteomics is highly complementary to 
genomic approaches in the drug discovery process and, 
for the first time, offers scientists the ability to integrate 
information from the genome, expressed mRNAs, their 
respective proteins and subcellular localization. It is ex- 
pected that this will lead to important new insights into 
disease mechanisms and improved drug discovery 
strategies to produce novel therapeutics. 

Among the major pharmaceutical and biotechnol- 
ogy companies, it is clearly recognized that the 
business of modern drug discovery is a highly 
competitive process. All of the many steps in- 
volved are inherently complex, and each can involve a 
high risk of attrition. The players in this business strive 
continuously to optimize and streamline the process; each 
seeking to gain an advantage at every step by attempting 
to make informed decisions at the earliest stage possible. 
The desired outcome is to accelerate as many key activities 
in the drug discovery process as possible. This should pro- 



duce a new generation of robust drugs that offer a high 
probability of success and reach the clinic and market 
ahead of the competition. 

There has been noticeable emphasis over recent years 
for companies to aggressively review and refine their 
strategies to discover new drugs. Central to this has been 
the introduction and implementation of cutting-edge 
technologies. Most, if not all, companies have now inte- 
grated key technology platforms that incorporate gen- 
omics, mRNA expression analysis, relational databases, 
high-throughput robotics, combinatorial chemistry and 
powerful bioinformatics. Although it is still early days to 
quantify the real impact of these platforms in clinical and 
commercial terms, expectations are high, and it is widely 
accepted that significant benefits will be forthcoming. This 
is largely based on data obtained during preclinical studies 
where the genomic^ and microarray^'"* technologies have 
already proved their value. 

However, there are several noteworthy outcomes that re- 
sult from this. Many comments are voiced that scientists 
armed with these technologies are now commonly faced 
with data overload. Thus, in some instances, rather than 
facilitating the decision process, the accumulation of more 
complex data points, many with unknown consequences, 
can seem to hinder the process. Also, most drug compa- 
nies have simultaneously incorporated very similar compo- 
nents of the new technology platforms, the consequence 
being that it is becoming difficult yet again to determine 
where a clear competitive advantage will arise. Finally, in 
recent years, largely as a result of the accessibility of the 
technologies, there has been an overwhelming emphasis 
placed on genomic and mRNA data rather than on protein 
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imaging interrogation (Proteograph^M) and annotation 




Figure 1, Steps involved in analysing a biological sample by proteomics. MCI, molecular cluster index. 



analysis. It is important to remember that proteins dictate 
biological phenotype - whether it is normal or diseased - 
and are the direct targets for most drugs. 

Proteomics: new technology for 
the analysis of proteins 

It is now timely to recognize that complementary technol- 
ogy in the form of high-throughput analysis of the total 
protein repertoire of chosen biological samples, namely 
proteomics, is poised to add a new and important dimen- 
sion to drug discovery. In a similar fashion to genomics, 
which aims to profile every gene expressed in a cell, pro- 
teomics seeks to profile every protein that is expressed^^. 
However, there is added information, since proteomics can 
also be used to identify the post-translational modifications 
of proteins^, which can have profound effects on bio- 
logical function, and their cellular localization. Importantly, 
proteomics is a technology that integrates the significant 
advances in two-dimensional (2D) electrophoretic separa- 
tion of proteins, mass spectrometry and bioinformatics. 
With these advances it is now possible to consistently de- 
rive proteomes that are highly reproducible and suitable 
for interrogation using advanced bioinformatic tools. 

There are many variations whereby different laboratories 
operate proteomics. For the purpose of this review, the 



process used at Oxford GlycoSciences (OGS), which uses 
an industrial-scale operation that is integral to its drug dis- 
covery work, will be described. The individual steps of 
this process, where up to 1000 2D gels can be run and 
analysed per week, are summarized in Fig. 1. The incom- 
ing samples are bar coded and all information relevant to 
the sample is logged into a Laboratory Information 
Management System (LIMS) database. There can be a wide 
range in the type of samples processed, as applicable to 
individual steps in the drug discovery pipeline, and these 
will be mentioned later. The samples are separated accord- 
ing to their charge (pi) in the first dimension, using iso- 
electric focusing, followed by size (MW) using SDS-PAGE 
in the second dimension. Many modifications have been 
made to these steps to improve handling, throughput and 
reproducibility. The separated proteins are then stained 
with fluorescent dyes which are significantly more sensi- 
tive in detection than standard silver methods and have a 
broader dynamic range. The image of the displayed pro- 
teins obtained is referred to as the proteome, and is digi- 
tally scanned into databases using proprietary software 
called ROSETTA"^". The images are subsequendy curated, 
which begins with the removal of any artefacts, cropping 
and the placement of pI/MW landmarks. The images from 
replicate images are then aligned and matched to one 
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another to generate a synthetic composite image. This is 
an important step, as the proteome is a dynamic situation, 
and it captures the biological variation that occurs, such 
that even orphan proteins are still incorporated into the 
analysis. 

By means of illustration, Fig. 1 shows the process 
whereby proteomes are generated from normal and dis- 
ease samples and how differentially expressed proteins are 
identified. The potential of this type of analysis is tremen- 
dous. For example, from a mammalian cell sample, in ex- 
cess of 2000 proteins can typically be resolved within the 
proteome. The quality of this is shown in Fig. 2, which 
shows representative proteomes from three diverse bio- 
logical sources: human serum, the pathogenic fungus 
Candida albicans and the human hepatoma cell line 
Huh7. 

Use f proteomics to identify 
disease specific proteins 

In most cases, the drug discovery process is initiated by 
the identification of a novel candidate target - almost al- 
ways a protein - that is believed to be instrumental in the 
disease process. To date, there is a variety of means 
whereby drug targets have been forthcoming. These in- 
clude molecular, cellular and genomic approaches, mostly 
centred upon DNA and mRNA analysis. The gene in ques- 
tion is isolated, and expression and characterization of its 
coded protein product - i.e. the drug target - is invariably 
a secondary event. 

With the proteomic approach, the starting point is at the 
other end of the 'telescope*. Here there is direct arid im- 



mediate comparison of the proteomes from paired normal 
and disease materials. Examples of these pairs are: (1) pu- 
rified epithelial cell populations derived from human 
breast tumours, matched to purified normal populations of 
human breast epithelial cells, and (2) the invading patho- 
genic hyphal form of C. albicans, matched to the non- 
invading yeast form of C. albicans. When the proteome 
images from each pair are aligned, the Proteograph'^" soft- 
ware is able to rapidly identify those proteins (each refer- 
enced as having a unique molecular cluster index, or MCI) 
that are either unique, or those that are differentially ex- 
pressed. Thus, the Proteograph output from this analysis is 
both qualitative and quantitative. 

Proteograph analysis for a particular study can also be 
undertaken on any number of samples. For example, one 
might compare anything from a few to several hundred 
preparations or samples, each from a normal and disease 
counterpart, and have these analysed in a single 
Proteograph study. In this way, it is possible to assign 
strong statistical confidence to the data and in some in- 
stances to identify specific subpopulations within the input 
biological sources. This feature will become increasingly 
significant in the near future, and there is a clear synergy 
here whereby proteomics can work closely with pharma- 
cogenomic approaches to stratify patient populations and 
achieve effective targeted care for the patient. Whatever 
the source of the materials, the net output of Proteograph 
analysis is immediate identification of disease specific pro- 
teins. This is shown in Fig. 3, which shows the results of 
a proteograph obtained by comparing untreated human 
hepatoma cells with cells following exposure to a clinical 
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Figure 2. Representative proteomes obtained from (a) human serum, (b) the pathogenic fungus Candida albicans 
and (c) the human hepatoma cell line Huh 7. 
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Foregrounds: Huh7 cells treated with 5FU 

Backgrounds: Huh7 cells untreated 

^■^■^H Upregulated in Huh7 cells treated with 5FU 

with respect to untreated Huh7 cells 
^^^^^^H Downregulated in Huh7 cells treated with 5FU 

with respect to untreated Huh7 cells 
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Figure 3* Table of differential protein expression 
profiles, referred to as a Rosetta Proteograph ™ 
between Huh? cells with and without the cytotoxic 
agent 5-FU, Bars are quantized and do not represent 
exact fold change values. 



cytotoxic agent. In this instance, only the top 20 differen- 
tially expressed MCIs are shown, but the readout would 
normally extend to a defined cut-off value, typically a two- 
fold or greater difference in expression levels, determined 
by the user. 

In a typical analysis involving disease and normal mam- 
malian material, in which each proteome would have 
--2000 protein features each assigned an MCI, the proteo- 
graph might identify somewhere in the region of 50-300 
MCIs that are unique or differentially expressed. To capi- 
talize rapidly on these data, at OGS a high-throughput 



mass spectrometry facility coupled to advanced databases 
to annotate these MCIs as individual proteins is applied. As 
these are all disease specific proteins, each could represent 
a novel target and/or a novel disease marker. The process 
becomes even more powerful when a panel of features, 
rather than individual features, are assigned. The relevance 
of this is apparent when one considers that most diseases, 
if not all, are multifactorial in nature and arise from poly- 
genic changes. Rather than analysing events in isolation, 
the ability to examine hundreds or thousands of events 
simultaneously, as shown by proteomics, can offer real 
advantages. 

Identification and assignment of candidate targets 
The rapid identification and assignment of candidate tar- 
gets and markers represents a huge challenge, but this has 
been greatly facilitated by combining the recent advances 
made in proteomics and analytical mass spectrometry^. 
Using automated procedures it is now possible to annotate 
proteins present in femtomole quantities, which would de- 
pict the low abundance class of proteins. The process of 
annotation is similarly aided by the quality and richness of 
the sequence specific databases that are currently avail- 
able, both in the public domain and in the private sector 
(e.g. those supplied by Iricyte Pharmaceuticals). In this re- 
spect, the advances in proteomics have benefited consider- 
ably from the breakthroughs achieved with genomics. 

From an application perspective, cancer studies provide a 
good opportunity whereby proteomics can be instrumental 
in identifying disease specific proteins, because it is often 
feasible to obtain normal and diseased tissue from the same 
patient. For example, proteomic studies have been re- 
ported on neuroblastomas^*^, human breast proteins from 
normal and tumour sources^ lung tumours^"^, colon tu- 
mours^^ and bladder tumours^^. There are also proteomic 
studies reported within the cardiovascular therapeutic area, 
in which disease or response proteins are identified' 

Genomic microarray analysis can similarly identify 
unique species or clusters of mRNAs that are disease spe- 
cific. However, in some instances, there is a clear lack of 
correlation between the levels of a specific mRNA and its 
corresponding protein (Ref. 19, Gypi, S.P. et al., submit- 
ted). This has now been noted by many investigators and 
reaffirms that post-transcriptional events, including protein 
stability, protein modification (such as phosphorylation, 
glycosylation, acylation and methylation) and cell localiz- 
ation, can constitute major regulatory steps. Proteomic 
analysis captures all of these steps and can therefore pro- 
vide unique and valuable information independent from, 
or complementary to, genomic data. 
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Pr te mics for targ t validati n and signal transduc- 
ti n studies 

The identification of disease specific proteins alone is in- 
sufficient to begin a dnjg screening process. It is critical to 
assign function and validation to these proteins by con- 
firming they are indeed pivotal in the disease process. 
These studies need to encompass both gain- and loss-of- 
function analyses. This would determine whether the activity 
of a candidate target (an enzyme, for example), eliminated 
by molecular/cellular techniques, could reverse a disease 
phenotype. If this happened, then the investigator would 
have increased confidence that a small-molecule inhibitor 
against the target would also have a similar effect. The 
proposal of candidate drug targets is often not a difficult 
process, but validating them is another matter. Validation 
represents a major bottleneck where the wrong decision 
can have serious consequences^**. 

Proteomics can be used to evaluate the role of a chosen 
target protein in signal transduction cascades directly rel- 
evant to the disease. In this manner, valuable information 
is forthcoming on the signalling pathways that are per- 
turbed by a target protein and how they might be cor- 
rected by appropriate therapeutics. Techniques that are 
well established in one-dimensional protein studies to in- 
vestigate signalling pathways, such as western blotting 
and immunoprecipitation, are highly suited to proteomic 
applications. For example, the proteomes obtained can be 
blotted onto membranes and probed with antibodies 
against the target protein or related signalling mol- 
ecules^^"^^. Because proteomics can resolve >2000 pro- 
teins on a single gel, it is possible to derive important 
information on specific isoforms (such as glycosylated or 
phosphorylated variants) of signalling molecules. This will 
result in characterization of how they are altered in the 
disease process. Western immunoblotting techniques 
using high-affinity antibodies will typically identify pro- 
teins present at ~10 copies per cell (-1.7 fmol); this is in 
contrast to the best fluorescent dyes currently available 
that are limited to imaging proteins at 1000 or more 
copies per cell. The level of sensitivity derived by these 
applications will greatly facilitate interpretation of com- 
plex signalling pathways and contribute significantly to 
validation of the target under study. 

Immunoprecipitation studies 

Similarly, immunoprecipitation studies are another useful 
way to exploit the resolving power of proteomics^'*'^^. In 
this instance, very large quantities of protein (e.g. several 
milligrams) can be subjected to incubation with antibodies 
against chosen signalling molecules. This allows high-affin- 



ity capture of these proteins, which can subsequently be 
eluted and electrophoresed on a 2D gel to provide a high- 
resolution proteome of a specific subset of proteins. 
Detection by blot analysis allows the identification of ex- 
tremely small amounts of defined signalling molecules. 
Again, the different isoforms of even very low abundance 
proteins can be seen, and, very importantly, the technique 
allows the investigator to identify multiprotein complexes 
or other proteins that co-precipitate with the target protein. 
These coassociating proteins frequently represent sig- 
nalling partners for the target protein, and their identifi- 
cation by mass spectrometry can lead to invaluable infor- 
mation on the signalling processes involved. 

The depth of signal transduction analysis offered by 
proteomics, and the utility for target validation studies, 
can be extended even further by applying cell fraction- 
ation studies^^^^. By purifying subcellular fractions, such 
as membrane, nuclear, organelle and cytosolic, it is possi- 
ble to assign a localization to proteins of.interest and to 
follow their trafficking in a cell. Enrichment of these frac- 
tions will also allow much higher representation of low 
abundance proteins on the proteome. Their detection by 
fluorescent dyes or immunoblot techniques will lead to 
the identification of proteins in the range of 1-10 copies 
per cell, putting the sensitivity on a par with genomic 
approaches. 

These signal transduction analyses can be of additional 
value in experiments where inhibitors derived from a 
screening programme against the target are being evalu- 
ated for their potency and selectivity. The inhibitors can 
encompass small molecules, antisense nucleic acid con- 
structs, dominant-negative proteins, or neutralizing anti- 
bodies microinjected into cells. In each case, proteome 
analysis can provide unique data in support of validation 
studies for a chosen candidate drug target. 

Proteomics and drug mode-of-action studies 

Once a validated target is committed to a screening regi- 
men to identify and advance a lead molecule, it is impor- 
tant to confirm that the efficacy of the inhibitor is through 
the expected mechanism. Such mode-of-action studies are 
usually tackled by various cell biological and biochemical 
methods. Proteomics can also be usefully applied to these 
studies and this is illustrated below by describing data ob- 
tained with OGT719. This is a novel galactosyl derivative of 
the cytotoxic agent 5-fluorouracil (5-FU), which is currently 
being developed by OGS for the treatment of hepatocel- 
lular carcinoma and colorectal metastases localized 
in the liver. The premise underpinning the design and ra- 
tionale of OGT719 was to derive a 5-FU prodrug capable 



DDT Vol. 4, No. 2 February 1999 



59 



'REVIEWS^ 

. ^ : a;.:J-i^,r.-J. 



research focus 




Figure 4* Features that are specifically up- or downregulated in Huh 7 cells by either 5-fluorouracil (5-FU) or 
OGT719: (ci) elongation factor la2, (b) novel (three peptides by MS-MS) and (c) a-subunit of prolyl-4-hydroxylase. 
Arrows indicate up- or downregulated. 



of targeting, and being retained in, cells bearing the asialo- 
glycoprotein receptor (ASGP-r), including hepatocytes^^, 
hepatoma Huh7 cells^^ and some colorectal tumour cells^^ 
The growth of the human hepatoma cell line Huh7 is in- 
hibited by 5-FU or by OGT719. If the inhibition by 
OGT719 were the result of uptake and conversion to 5-FU 
as the active component, then it would be expected that 
Huh7 cells would show similar proteome profiles follow- 
ing exposure to either drug. 

To examine these possibilities, we conducted an experi- 
ment taking samples of Huh7 cells that had been treated 
with doses of either OGT719 or 5-FU. Total cell lysates 
were prepared and taken through 2D electrophoresis, 
fluorescence staining, digital imaging and Proteograph 
analysis. To facilitate the interpretation of the data across 
all of the 2291 features seen on the proteomes, drug- 
induced protein changes of fivefold or greater, identified 
by the Proteograph, were analysed further. Interestingly, 
from this analysis 19 identical proteins were changed five- 
fold or more by both drugs, strongly suggesting similarities 
in the mode of action for these two compounds. 

Thus, from very complex data involving >2000 protein 
features, using proteomics it is possible to analyse quanti- 
tatively and qualitatively each protein during its exposure 
to drugs. The biologist is now able to focus a series of fur- 
ther studies specifically on an enriched subset of proteins. 



Figure 4 shows highlighted examples of the selected areas 
of the proteome where some of these identified proteins in 
the above study are altered in response to either or both 
drugs. 

Several of the proteins identified above as being modu- 
lated similarly by 5-FU or OGT719 in Huh7 cells were sub- 
jected to tandem mass-spectrometric analysis for anno- 
tation. Some of these, such as the nuclear ribosomal 
RNA-binding protein^^, can be placed into pyrimidine 
pathways or related cell cycle/growth biochemical path- 
ways in which 5-FU is known to act. 

To attribute further significance to the proteome mode- 
of-action studies with OGT719, another cell line, the rat 
sarcoma HSN, was used. Growth of these cells is inhibited 
by 5-FU, but they are completely refractory to OGT719; 
notably they lack the ASGP-r, which might explain this 
finding (unpublished). For our proteome studies, HSN 
cells were treated with 5-FU or OGT719 over a time course 
of one, two and four days. At each time point, cells were 
harvested and processed to derive proteomes and 
Proteographs. As before, we purposely focused on those 
proteins that increased or decreased by fivefold or more. 
In this instance, there were no proteins co-modulated by 
the two drugs. This is perhaps to be expected, given that 
the HSN cells are killed by 5-FU and yet are refractory to 
OGT719. 



60 



DDT Vol. 4. No. 2 February 1999 



research focus 



Clear potential 

The above is just an example of how proteomics can be 
used to address the mode of action of anticancer drugs. 
The potential of this approach is clear, and one can envis- 
age situations where it will be profitable to compare the 
proteomes of cells in which the drug target has been elimi- 
nated by molecular knockout techniques, or with small- 
molecule inhibitors believed to act specifically on the same 
target. In addition to using proteomics to examine the ac- 
tion of drugs, it is also possible to use this approach to 
gauge the extent of nonspecific effects that might eventu- 
ally lead to toxicity. For instance, in the example used 
above with HSN cells treated with OGT719, although cell 
growth was not affected, the levels of several specific pro- 
teins were changed. Further investigation of these proteins 
and the signalling pathways in which they are involved 
could be illuminating in predicting the likelihood or other- 
wise of long-term toxicity. 

Use of proteomics in formal drug 
t xicology studies 

A drug discovery programme at the stage where leads 
have been identified and mode-of-action studies are ad- 
vanced, will proceed to investigate the pharmacokinetic 
and toxicology profile of those agents. These two param- 
eters are of major importance in the drug discovery 
process, and many agents that have looked highly promis- 
ing from in vitro studies have subsequently failed because 
of insurmountable pharmacokinetic and/or toxicity prob- 
lems in vivo. Whereas the pharmacokinetic properties of a 
molecule can now be characterized quickly and accu- 
rately, toxicity studies are typically much longer and more 
demanding in their interpretation. 

The ability to achieve fast and accurate predictions of 
toxicity within an in vivo setting would represent a big 
step forward in accelerating any drug discovery pro- 
gramme. Toxicity from a drug can be manifested in any 
organ. However, because the liver and kidney are the 
major sites in the body responsible for metabolism and 
elimination of most drugs, it is informative to examine 
these particular organs in detail to provide early indi- 
cations about events that might result in toxicity. 

The basis for most xenobiotic metabolizing activity is to 
increase the hydrophilicity of the compound and so facili- 
tate its removal from the body. Most drugs are metabo- 
lized in the liver via the cytochrome P450 family of en- 
zymes, which are known to comprise a total of -200 
different members^^*^'*, encompassing a wide array of 
overlapping specificities for different substrates. In addi- 
tion to clearance, they also play a major role in metabo- 




lism that can lead to the production and removal of toxic 
species, and in some instances it is possible to correlate 
the ability or failure to remove such a toxin with a specific 
P450 or subgroup. 

Unique P450 profiles 

Each individual person will have a slightly different P450 
profile, largely from polymorphisms and changes in ex- 
pression levels, although other genetic and environmental 
factors aside from P450 also need to be taken into consid- 
eration. A significant amount of research is currently 
being directed towards this field - known as pharmacoge- 
nomics - with the aim of predicting how a patient will re- 
spond to a drug, as determined by their genetic make- 
yp35-37 jj^g marked variation of individuals in their ability 
to clear a compound can be one of the key factors in de- 
ciding the overall pharmacokinetic profile of a drug. Not 
only will this have a bearing on the likelihood of a patient 
responding to a treatment, but it will also be a factor in 
determining the possibility of their experiencing an ad- 
verse effect. 

Many pharmaceutical companies are already employing 
genomic approaches, involving P450 measurements, as a 
key step in' their assessment of the toxicological profile of 
a candidate drug and therefore of its suitability, or other- 
wise, to be considered for human clinical trials. There are 
limits to this approach, however. Whereas the P450 mRNA 
profiling can predict with some accuracy the likely meta- 
bolic fate of a drug, it will not provide information on 
whether the metabolites would subsequently lead to tox- 
icity. Besides the patient-to-patient differences in steady- 
state levels of the P450s, there are also characteristic induc- 
tion responses of these enzymes to some drugs. Moreover, 
as there can be some doubt over the correlation of mRNA 
levels and the corresponding protein levels, there is scope 
for misinterpretation of the results and hence real advan- 
tages to be gained from a proteome approach. In both in- 
stances, the ability to examine entire proteome profiles, in- 
cluding the P450 proteins, will be a significant advantage 
in understanding and predicting the metabolism and 
toxicological outcome of drugs. 

In addition to direct organ and tissue studies, the serum, 
which collects the majority of toxicity markers released 
from susceptible organs and tissues throughout the entire 
body, can be utilized. Serum is rich in nuclease activity 
and, as pharmacogenomics is not suited to deal with these 
samples, valuable markers of toxicity could go undetected. 
However, by using proteomics for these types of analyses, 
serum markers (and clusters thereoO are now accessible 
for evaluation as indicators of toxicity. 
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Pharmacoproteomics 

Proteomics can thus be used to add a new sphere of 
analysis to the study of toxicity at the protein level, and in 
the era of '-omics' there is a case to be made to adopt the 
term Tharmacoproteomics*^^'. Animals can be dosed with 
increasing levels of an experimental drug over time, and 
serum samples can be drawn for consecutive proteome 
analyses. Using this procedure, it should be possible to 
identify individual markers, or clusters thereof, that are 
dose related and correlate with the emergence and severity 
of toxicity. Markers might appear in the serum at a defined 
drug dose and time that are predictive of early toxicity 
within certain organs and if allowed to continue will have 
damaging consequences. These serum markers could sub- 
sequently be used to predict the response of each individ- 
ual and allow tailoring of therapy whereby optimal effi- 
cacy is achieved without adverse side effects being 
apparent. This application can obviously extend to track- 
ing toxicity of drugs in clinical trials where serum can be 
readily drawn and analysed. Surrogate markers for drug ef- 
ficacy could also be detected by this procedure and could 
facilitate the challenge of identifying patient classes who 
will respond favourably to a drug and at what dosage. 

C nclusions 

By contrast to the agents administered to patients in clini- 
cal wards, the process of drug discovery is not a prescrip- 
tive series of steps. The risks are high and there are long 
timelines to be endured before it is known whether a can- 
didate drug will succeed or fail. At each step of the drug 
discovery process there is often scope for flexibility in in- 
terpretation, which over many steps is cumulative. The 
pharmaceutical companies most likely to succeed in this 
environment are those that are able to make informed 
accurate decisions within an accelerated process. 

The genomics revolution has impacted very positively 
upon these issues and now has a powerful new partner in 
proteomics. The ability to undertake global analysis of pro- 
teins from a very wide diversity of biological systems and 
to interrogate these in a high-throughput, systematic man- 
ner will add a significant new dimension to drug discov- 
ery. Each step of the process from target discovery to clini- 
cal trials is accessible to proteomics, often providing 
unique sets of data. Using the combination of genomics 
and proteomics, scientists can now see every dimension of 
their biological focus, from genes, mRNA, proteins and 
their subcellular localization. This will greatly assist our 
understanding of the fundamental mechanistic basis of 
human disease and allow new improved and speedier 
drug discovery strategies to be implemented. 
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JK THE UNITED STATES PATENT AND TRADEMARK OFFICE 



DECLARATION OF TOD BEDILION, Ph.D. 
UNDER 37 C.F.R. § 1.132 



1/ TOD BEDILION, Ph.D., declare and state as 

follows : 

1. In April, 1996, I became the first employee of 
Synteni, Inc., where I served as Research Director until its 
acquisition by Incyte Corp"oration in early 1998. After 
Synteni 's acquisition, I continued in the position of Director 
of Corporate Development at Incyte until May 11, 2001. I am 
currently the Director of "Business Development at Genomic 
Health, Inc., Redwood City, California and an occasional 
Consultant to Incyte. 

2. Synteni was founded to commercialize expression 
microarrays, microarrays in which expressed nucleic acids — 
full-length cDNAs, fragments cf full-length cDNAs, expressed 
sequence tags (ESTs) — are arrayed on a common support to 
permit highly parallel detection and measurement of the 
expression of their cognate genes in a biological sample. 

3. During my employ at Synteni, virtually all (if 
not all) of my work efforts were directed to the further 
technical development and the commercial exploitation of that 
microarray technology; given the small size of our shop, most 
of us had both technical and commercial responsibilities. The 
customer accounts for which I was personally responsible 
included large pharmaceutical companies, such as SmithKline 



Beecham. la.,e Mot.chnology ccnp.„le., such a= G.„entech, 
snail research institutes, such as DMAX Inc. 

4- From my very first interaction with our 
customers, consistently through to Synteni's ac^isltion by 
incyte, I heard uniform, consistent, and emphatic requests 
that Hore genes be added to the arrays. This was true with 
respect to both our original nicroarrays, based on custo^er- 
prov.=ed genes and libraries, and our later, "generic", gene 
expression microarrays, based upon the unigen. clone 
collection (our so-called "OniGem" arrays) . From day 1, the 
pressure on us was to print ever more spots on the array, it • 
was £2ar a question: our customers wanted ever more genes on 
the array, each new gene-specific probe providing 
incrementally more value to the customer. 

S- As a commercial enterprise, providing value to ■ 
our customers was our major concern. Thus, to increase the 
value of our products and services in the marketplace - to 
increase our ability to sell our microarrays and microarray 
services, their "a.lability- - our effort, from the very 
beginning were devoted to Increasing the number of specific 
genes whose expression could be detected with our microarrays. 

6- Indeed, one of our major competitive advantages 
rn the marketplace - not just as regards other c«™,ercial 
suppliers, but also with respect to the innumerable 
laboratories and companies that were attempting to spot a-rays 
in their own -home-b rew" facilities - was the number of 

and ajl expressed g.nes. asking for probes sp clfic Co sny 
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distinct gene-specific probes that we provided on our 
expression microarrays. Our first 10,000 element UniGem array 
put the holy grail of gene expression analysis — the hatian 
whole genome array — within sight for the very first time 
(With respect to timing of the UniGEM program we began project 
planning and technology development in mid 1996 and delivered 
our first 10,000 element standard content human arrays in the 
first months of 1997 as I recall) . 

7. By the end of 1997, our efforts to provide the 
most comprehensive, and thus most valuable, human gene 
expression. microarrays had been sufficiently successful that 
Incyte agreed to acquire Synteni for a reported $80 million. 

8. I declare further that all statements made 
herein of my own knowledge are true end that all statements 
made on information and belief are believed to be true, and 
further that these statements were made with the knowledge 
that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under 
Section 1001 of Title 18 of the United States Code and may 
jeopardize the validity of any patent application in which 
this declaration is filed or any patent that issues thereon. 

Tod Bedilion, Ph.D. Date 
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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



DECLARATION OF VISHWANATH R. IYER, Ph D 
UNDER 37 C.F.R. S 1 



I, VISHWANATH R. IYER, Ph.D., declare and state as 

follows : 

1. I am an Assistant Professor in the Section of 
Molecular Genetics and Microbiology, Institute of Cellular and 
Molecular Biology, University of Texas at Austin, where my 
laboratory currently studies global transcriptional control in 
yeast, gene expression programs during human cell 
proliferation, and genome-wide transcription factor targets in 
yeast and human. Immediately prior to this position, I spent 
four years as a postdoctoral fellow. in the laboratory of 
Patrick 0. Brown at Stanford University studying the 
transcriptional programs of yeast and of human cells. My 
curriculum vitae is attached hereto as Exhibit A. 

2. Beginning in Dr. Brown's laboratory, where I 
helped to develop the first whole genome arrays for yeast and 
early versions of highly representative cDNA arrays for human 
cells, and continuing to the present day, i have used 
microarray-based gene expression analysis as a principal 
approach in much of my research. 



3. Representative publications describing this 
work include: 



scale. • science 278?68o4^ "997".? 
identmcati=„' f ^ ' '^"^J ""^'^ validation and 

the riz.ti ti-i..z%]:^zr.Ts'i:V"'''- 

science 283:83-87 (1999) -"and ' ' 

exp.e^-L^?^e.n;^rrjrrc:nc^r::?:--r. 

Nature Cenetics 24: 227-235 (2oS" ' 
TWO of the papers describe our use of microarray-based 
egression profiling to explore the metabolic reprogran^ng 
hat occurs during „.Jor environ:„ental changes, both in yeast 
(DeRxsi et aJ., during the shift fro,n fermentation to 
respiration) and in hun,an cells ,lyer et al . , human 
fibroblasts exposed to seru.) . one reference describes our 
use of expression profile analysis in drug target validation 
and xdentification of secondary drug effects (Marton et al I 
And one describes our use of expression profiling as a 
molecular phenotyping tool to discriminate among human cancer 
cells (Ross et al.) . 

4. Whether used to elucidate basic physiological 
responses, to study primary and secondary drug effects, or to 
discriminate and classify human cancers, expression profiling 
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as we have practiced it relies fn-r 

i>atte«* nf °" comparison of 

paccerns of expression 



us. th. ^" ""^^ ^-onstratea that „e can 

use the presence cr absence of a characteristic drug 
Signature- pattern of altered r,»r,« 

expression in druq-treahor> 
cells to explore the mechanism of drug action /""^ '"^^'^^ 
secondary effects tha^ c.n • , '° identify 

Side effLts T potentially deleterious drug 

effects. AS another example, we have demonstrated that 
gene expression patterns can be used to classifv h 
cell lines, while it is of classify human tumor 

advantageous to know th^ 
b.oao,ica. function of the encoae. ,ene. products in^I t, 
reach a .etter understanding of the cellular .echanis.s 
underlying these results, these pattern-hased anal.serdo not 
re^^.re^.o.ledge of the hiological function of thrZode: ' 

6. The resolution of the patterns used in such 
conpar^sons is detex^ined hy the nu^er of genes detected the 
greater the nu^er of genes detected, the higher the 
resolution of the pattern, it goes without saying that hiche 

resolution patterns are generallv ™„ . ^ 

e generally more useful in such 

con^ar.sons than lower resolution patterns. With such hiohe 
resolutions co„es a correspondingly higher degree o ' 
statistical confidence for distinguishing different patterns 
as well as identifying similar ones. 

orov. '"''"'"^ " " P"^^ °" - microarray 

provides a signal that is specific to th» . 

1 ^ t"='-iiic to the cognate transcriot 

at least to a first approximation ' Each new „ 

^^i. -bacn new gene-specific 

5 _ 

in a more nuanced view ir ^ ^ ■ . 
signal the presence of a varies of sulltt ^^^^^^^^ for a probe t 

ei:y or splice variants of a sinai^ 
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a single gene, 

(Coniinued...) 



probe added to . microarray thus increases the nu.O»r of jenes 
detectable by the device, increasing the resolving power of 
the device. As I note above, higher resolution patterns are 
generally .ore useful in comparisons than lower resolution 
patterns. Accordingly, each new gene probe added to a 
microarray increases the usefulness of the device in gene 
expression profiling analyses. This proposition is so well- 
established as to be virtually an axiom in the art, and has 
been as long as I have been working in the field, and 
certainly since the ti.e I e^O^arked on the production of whole 
genome arrays in early 1996. simply put, arrays with fewer 
gene-specific probes are inferior to arrays with more gene- 
specific probes. 

8. For exan.ple, our ability to subdivide cancers 

into discriminable classes by exDression t.^«4= • •. • 

expression profiling is limited 

by the resolution of the patterns produced, with more genes 
contributing to the expression patterns, we can potentially 
draw finer distinctions among the patterns, thus subdividing 
otherw.se indistinguishable cancers into a greater nuxnber of 
Classes; the greater the nu^er of classes, the greater the 
l.kel.hood that the cancers classified together win respond 
s.m.larly to therapeutic intervention, permitting better 
individualization of therapy and, we hope, better treatment 
outcomes . 

9. If a gene does not change expression in an 
experiment, or if a oen*» ie 

gene is not expressed and produces no 
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without discriminating among them anr? f«-^ » 

of a variety of allelic variantr;f a s.noi "^"""^^ '° "^^^ presence 
discriminating amono th«n ""^^^ S^"^' ^9^^" without 
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signal .n an «.perln«„t, that is not to say that the probe 
lacks usefulness on the array; it only ™eans that an 
insufficient number of conditions have been sanpled to 
identify expression changes. i„ uct. an experiment showing 
that a gene .s not expressed or that its expression level does 
not Change can be e^ally infonnative. To provide ^i.u„ 
versat.l.ty as a research tool, the .dcroarray should 
include - and as a biologist I would want ^ .icroarray to 
include - each newly identified gene as a probe. 

10. I declare further that all statements made 
herein of ^ own knowledge are true and that all statements 
made on information and belief are believed to be true, and 
further that these statements were made with the knowledge 
that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under 

jeopardize the validity of any patent application in which 
this declaration is filed or any patent. that issues thereon 
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EXHIBIT A 

Docket No.: PC-0044 CIP 
USSN: 09/895.686 



Vishwanath R. Iyer 

Assistant Professor 

Section of Molecular Genetics and Microbiology 

Institute of Cellular and Molecular Biology 

MBB3.212A, University of Texas at Austin 

Austin, TX 78712-0159 

Phone: 512-232-7833 

Fax: 512-232-3432 

Email: vishy@mail.utexas.edu 

Education/Training 

Bombay University Mumbai, India B.Sc. (1987), Chemistry & Biochemistry 

M. S. University of Baroda, Baroda, India M.Sc. (1989), Biotechnology 

Harvard University, Cambridge MA Ph.D. (1996), Genetics 

Stanford University, Stanford CA Post-doctoral (1996-2000), Genomics 

Research Experience 

9/00-5/03 Assistant professor, Section of Molecular Genetics and 
Microbiology, University of Texas, Austin TX 

■ Global transcriptional control in yeast 

■ Gene expression programs during human cell proliferation 

■ Genome-wide transcription factor targets in yeast and human 

■ Collaborative microarray facility 

5/96-8/00 Post-doctoral fellow Stanford University, Stanford CA 
(Advisor: Dr. Patrick 0. Brown) 

■ Yeast whole-genome ORF and intergenic microarrays 
• Human cDNA microarrays for expression profiling 

9/89-4/96 Graduate student Harvard University, Cambridge MA 

(Advisor: Dr. Kevin Struhl) 

■ Yeast transcriptional regulation 

Honours and Awards 

Government of India Biotechnology Fellowship (1987-1989) 
University Grants Commission Junior Research Fellowship (1989) 
Stanford University/NHGRI Genome Training Grant (1996) 

Invited Conference talks (selected) 

Invited Lecturer, NEC-Princeton Lectures in Biophysics 

Princeton, NJ (June 1998) 
Plenary Session Speaker, HGM '99 (HUGO Human Genome Meeting) 

Brisbane, Australia (April 1999) 
Invited Speaker, Gordon Research Conference "Human Molecular Genetics" 

Newport, RI (August 2001) 



Invited Speaker, Nature Genetics "Oncogenomics 2002" Conference 
Dublin, Ireland (May 2002) 

Invited Speaker, "Pathology Bioinformatics" Symposium, University of Michigan 
Ann Arbor, MI (November 2002) 

Invited Speaker "Systems Biology: Genomic Approaches to Transcri^^^ 

Regulation Cold Spnng Harbor Laboratory Meeting (March 2003) 
Symposium co-Chair and Speaker "Functional Genomics" American Society for 
T "'f^. Mo ecu ar Biology Meeting, San Diego, CA (April 2003) 

^^^tl^^Srl^''^'^^^^ ^^^"^ SymposiSm, International 

Congress of Genetics, Melbourne Australia July 6-11 20oq 
Invited Speaker "BioArrays Europe 2003" 

Cambridge, UK (Sep/Oct 2003) 

Departmental Seminars 

Texas A&M University Genetics and Biochemistry & Biophysics Departments 
October 24 2002 ^ "icuu,, 

New York University School of Medicine, Department of Biochemistry 
November 20 2002 

UT Southwestern Medical Center, Human Genetics Seminar Series 
May 5 2002 * 

UCLA School of Medicine, Department of Human Genetics 
June 2 2003 

National Human Genome Research Institute 
June 12 2003 

Sanger Institute of the Wellcome Trust, Hinxton, UK 
Sep 2003 

Other Professional Activities 

^^^^^^J^°^^''°'^^^'°^''9y. Genome Research, 

(^S)°- 2003)^ ^""^ ^"""''^ "^^^^"S ""'"S Microarrays" 

Member, NIDDK Special Emphasis Review Panel ZDKi (2001-2002) 

Publications 

1. lygLV & Struhl, K. (1995) Poly(dA:dT), a ubiquitous promoter element that 
stimulates transcnption via its intrinsic DNA structure, EMBOJ. 14: 2570-2579. 

^' ^ ^ ^^995) Mechanism of differential utilization of the his"? TR and TC 

TATA elements, MoL Cell. Biol. 15: 7059-7066. ^ 

3. mJL & Struhl K. (1996) Absolute mRNA levels and transcription initiation rates in 
5acc/iaro777yces ccrewsiae. Proc. Natl. Acad. Sci . (USA) 93:5208-5212. 



4. DeRisi J. L., IverV.R. & Brown P. 0. (1997) Exploring the metabolic and genetic 

control of gene expression on a genomic scale. Science 278:680-686 

5. Marton M. J DeRisi J. L, Bennett H. A., IverV.R.. Meyer M. R., Roberts C J 

P. 0 & Fnend S. H. (1998) Drug target validation and identification of secondary 
drug target effects using DNA microarrays. Nature Med. 4:1293-1301 

6. Lutf yya L. L. lyery^ DeRisi J., DeVit M. J., Brown P. 0. & Johnston M. (1998) 

Charactenzation of three related glucose repressors and genes they regulate in 
5acc/iaromi/ces cereuisiae. Generics 150:1377-1391 

7. Spellman P T Sherlock G., Zhang M. Q., IverV. R.. Anders K., Eisen M B Brown P 

0., Botstem D. & Futcher B. (1998) Comprehensive identification of cell c^cle 

M B^l Ceul^^llf^j Saccharomyces cerevisiae by microarray hybridization. 

8. lyerV R. , Eisen M. B., Ross D. T., Schuler G., Moore T., Lee J. C F Trent J M 
Staudt L. M., Hudson Jr. J., Boguski M. S., Lashkari D. ShaL D^BoS^ein ^& 

^K^Z ^'^^^^ The transcriptional program in the response of human 
fibroblasts to serum. Science 283:83-87 

9. DeRisi J. L. & lyerV.R. (1999) Genomics and array technology. Curr. Opin. Oncol 

11.70-79 

10. R^s D. T., Scheri- U., Eisen M. B., Perou C. M., Spellman P Iver V R Ree.; r 
Jeffrey S. S., Van de Rijn M., Waltham M.. Pergamenschiko;S7 C F 
r^non^Q ^" Wei"stein J. N., Botstein D., & Brown P. 0 
'nZI S?n~ 2~^ ^^"^ — ^-es. 

11. Sudarsanam P. lyerV.R Brown P. 0. & Winston F. (2000) Whole-genome 
expression analysis of snf/swi mutants of S. cerevisiae. Proc. Natl. Acad. Sci .(USA) 
97" 33^4 33^9 

12. Tran H. G Steger D J IverV R. , & Johnson A. D. (2000) The chromo domain 

19^232^^^^^^^^^ ''''' ATP-dependent chromatin-modif>ing factor 

13. Gross C, Kelleher M.. lygiV^, Brown P. O., & Winge D. R.. (2000) Identification 
of the copper regulon in Saccharomyces cerevisiae by DNA microarrays J Biol 
Chem. 275: 32310-32316 

14. Reid J. L IverV.R. , Brown P. 0. & Struhl K. (2000) Coordinate regulation of yeast 
nbosomal protein genes is associated with targeted recruitment of Esai histone 
acetylase. Mol. Cell 6: 1297-1307 



15. l yerV. R., Horak C, Scafe C. S., Botstein D, Snyder M. & Brovx-n P 0 (2001) 
Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF 
Nature 409: 533-538 

16. MiW R. I^dota K, Bono H. Mizuno Y., Tomaru Y., Carninci P., Itoh M., Shibata K, 
Kawai J., Konno H Watanabe S., Sato K, Tokusumi Y., Kikuchi N., Ishii Y 
Hama^chi Y., Nl^hizuka I Goto H., Nitanda H.. Satomi S., Yoshiki A., Kus'akabe 

ni' V i u ^-^^^ P-O" Muramatsu M., Shiiada H., 

Okazaki Y. & Hayashizaki Y. (2001) Delineating developmental and metabolic 
pathways in vivo by expression profiling using the RIKEN set of 18,816 full-length 
ennched mouse cDNA arrays Proc. Natl Acad. Sci. (USA) 98: 2199-2204 

17. Pollack J. R. & (2002) Characterizing the physical genome. Nature 
Genetics 32 suppl: 515-521 

18. lyerV.R. Microarray-based detection of DNA protein interactions: Chromatin 
Immunoprecipitation on Microarrays, in DNA Microarrays: A Molecular Clonina 
P^ess" 200^' ^' ^ '^■^ 453-463 (Cold Spring Harbor Laboratory 
*(nof peer reviewed) 

19. Killion, P., Sherlock G. and lyerV. R. (2003) The Longhorn Array Database an 
open-source implementation of the Stanford Microarray Database BMC ' 
Bioinformatics 4: 32 

20.1iahn J. S., Hu Z. Thiele D. J. & IverV. R. Genome-Wide Analysis of the Biology of 
Stress Responses Through Heat Shock Transcription Factor (submitted to PNAS) 

21. Kim J. & lyerV.R. The global role of TBP recruitment to promoters in mediating 
gene expression profiles (manuscript in preparation) 



Current/Pending Research Support 

Uoi AA13518-01 Adron Harris (PI) 25% effort 

9/28/01 - 9/27/06 

NIH/NIAAA 

"INIA: Microarray Core" 

nxu ^ '■^'P°"^« *° Integrative Neuroscience Initiative on Alcoholism 

(INIA) RFA^AA-oi-002. The overall goal is to support the use of microarray technology 

to define changes m gene expression that either predict or accompany excessive alcohol 

consumption. 

Role: Co-investigator 



003658-0223-2001 Iyer (PI) 16% effort 
01/01/02-08/31/04 

Texas Higher Education Coordinating Board (ARP) 

^Mkroarray based global mapping of DNA-protein interactions at promoters in human 

IrorllZ^''' '° ""'^ ™ interactions of transcription factors with human 
Role: PI 



Information Technology Research 0325116 R. Mooney (PI) 9% effort 
09/01/03-08/31/07 ^y^ocuon 

NSF 

D^s^ve^'-^'^"" Multi-Source Data Mining to Experimentation for Gene Network 



Role: Co-investigator 



1 Roi CA95548-01A2 (pending) Iyer (PI) 25% effort 

12/1/03 - 11/30/08 

NIH 

transcriptional control in veast" 

fteli offijSs*'" ''^^^ ^^-^ through 

Role: PI 



Breast Cancer Idea Award (pending) Iyer (PI) 10% effort 
1/1/04 - 12/31/06 

US Army Medical Research and Materiel Command 

"Genome-wide chromosomal targets of oncogenic transcription factors" 

This IS a project aimed at identifying direct chromosomal targets of c-myc and ER in 

human cells through the use of a novel sequence tag analysis method 

003658-0531-2003 (pending) Marcotte (PI) 8% effort 
01/01/04-12/31/05 

Texas Higher Education Coordinating Board (ATP) 

gSoSle"""™' '"«''-*™''8''P« P'=tf°™ for measuring gene (unction on a 
TOs proposal is aimed at developing a novel microarray based platform for automated 
rfge'n^Z^r """""""" """^"^ "P'"* systematic e"on 
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Exploring the Metabolic and Genetic Control of 
Gene Expression on a Genomic Scale 

Joseph L DeRisi, Vishwanath R. Iyer, Patrick 0. Brown* 

DNA microarrays containing virtually every gene oiSaccharomyces cerevisiae were used 
to cany out a comprehensive investigation of the temporal program of gene expression 
acc mpanying the metabolic shift from fermentation to respiration The expression 
profil s observed for genes with known metabolic functions pointed to features of the 
metab lie reprogramming that occur during the diauxic shift, and the expression patterns 
of many previously uncharacterized genes provided clues to their possible functions The 
sam DNA microan-ays were also used to identify genes whose expression was affected 
by deletion of the transcriptional co-repressor TUP7 or overexpression of the transcrip- 
tional activator YAP1. These results demonstrate the feasibility and utility of this ap- 
proach to genomewide exploration of gene expression pattems. 



The complete sequences of nearly a dozen 
microbial genomes are known, and in the 
next several years we expect to know the 
complete genome sequences of several 
metaioans, including the human genome. 
Defining the role of each gene in these 
genomes will be a fonnidable task, and un- 
derstanding how the genome functions as a 
whole in the complex natural history of a 
living organism presents an even greater 
challenge. 

Knowing when and where a gene is 
expressed often provides a strong clue as to 
its biological role. Conversely, the pattern 
of genes expressed in a cell can provide 
detailed information about its state. Al- 
though regulation of protein abundance in 
a cell is by no means accomplished solely 
by regulation of mRNA, virtually all dif- 
ferenccs in cell type or state are correlated 
with changes in the mRNA levels of many 
genes. This is fortuitous because the only 
specific reagent required to measure the 
abundance of the mRNA for a specific 
gene is a cDNA sequence. DNA microar- 
rays, consisting of thousands of individual 
gene sequences printed in a high-density 
array on a glass microscope slide {1, 2). 
provide a practical and economical tool 
for studying gene expression on a very 
large scale (3-6). 

Sacchmomyces cereinsiae is an especially 

Department of Biochemistry. Stanford University School 
of Medicine. Howard Hughes Medicai Institute. Stanford. 
OA 94305-5428, USA. 

•To whom correspondence shoiid be addressed, E-maH: 
pt3rown®cmgm.stantord.edu 



favorable organism in which to conduct a 
systematic investigation of gene expression. 
The genes are easy to recognize in the ge- 
nome sequence, cis regulatory elements are 
generally compact and close to the tran- 
scription units, much is already known 
about its genetic regulatory mechanisms, 
and a powerfjl set of tools is available for its 
analysis. 

A recurring cycle in the natural history 
of yeast involves a shift from anaerobic 
(fermentation) to aerobic (respiration) me- 
tabolism. Inoculation of yeast into a medi- 
um rich in sugar is followed by rapid growth 
fueled by fermentation, with the production 
of ethanol. When the fermentable sugar is 
exhausted, the yeast cells turn to ethanol as 
a carbon source for aerobic growth. This 
switch from anaerobic growth to aerobic 
respiration upon depletion of glucose, re- 
ferred to as the diauxic shift, is correlated 
with widespread changes in the expression 
of genes involved in fundamental cellular 
processes such as carbon metabolism, pro- 
tein synthesis, and carbohydrate storage 
(7). We used DNA microarrays to charac- 
terize the changes in gene expression that 
take place during this process for nearly the 
entire genome, and to investigate the ge- 
netic circuitry that regulates and executes 
this program. 

Yeast open reading frames (ORFs) were 
amplified by the polymerase chain reaction 
(PGR), with a commercially available set of 
primer pairs (8). DNA microarrays, con- 
taining approximately 6400 distinct DNA 
sequences, were printed onto glass slides by 
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using a simple robotic printing device (9). 
Cells from an exponentially growing culture 
of yeast were inoculated into fresh medium 
and grown at 30'C for 21 hours. After an 
initial 9 hours of growth, samples were har- 
vested at seven successive 2-hour intervals, 
and mRNA was isolated (10). Ruorescently 
labeled cDNA was prepared by reverse tran- 
scription in the presence of Cy3 (green)- 
or Cy5(red)-labelcd deoxyuridine triphos- 
phate (dUTP) (i J) and then hybridized to 
the microarrays (12). To maximize the re- 
liability with which changes in expression 
levels could be discerned, we labeled cDNA 
prepared from cells at each successive time 
point with Cy5. then mixed it with a Cy3- 
labeled "reference" cDNA sample prepared 
from cells harvested at the first interval 
after inoculation. In this experimental de- 
sign, the relative fluorescence intensity 
measured for the Cy3 and Cy5 fluors at 
each array element provides a reliable mea- 
sure of the relative abundance of the corre- 
sponding mRNA in the two cell popula- 
tions (Fig. 1). Data from the scries of seven 
samples (Fig. 2), consisting of more than 
43,000 expression-ratio measurements, 
were organized into a database to facilitate 
efficient exploration and analysis of the 
results. This database is publicly available 
on the Internet (13). 

During exponential growth in glucose- 
rich medium, the global pattern of gene 
expression was remarkably stable. Indeed, 
when gene expression pattenu between the 
first two cell samples (harvested at a 2-hour 
interval) were compared, mRNA levels dif- 
fered by a factor of 2 or more for only 19 
genes (0.3%), and the largest of these dif- 
ferences was only 2.7-foid ( J4). However, as 
glucose was progressively depleted from the 
growth media during the course of the ex- 
periment, a marked change was seen in the 
global pattern of gene expression. mRNA 
levels for approximately 710 genes were 
induced by a factor of at least 2. and the 
mRNA levels for approximately 1030 genes 
declined by a factor of at least 2. Messenger 
RNA levels for 183 genes increased by a 
factor of at least 4. and mRNA levels for 
203 genes diminished by a factor of at least 
4. About half of these differentially ex- 
pressed genes have no currently recognized 
function and are not yet named. Indeed, 
more than 400 of the differentially ex- 
pressed genes have no apparent homology 
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t any gene whose function is kn wn (J5). 
The responses of these previously unchar- 
actcrized genes t the diauxic shift therefore 
provides the first small clue to their possible 
roles. 

The global view of changes in expres- 
sion of genes with known functions pro- 
vides a vivid picture of the way in which 
the cell adapts to a changing environ- 
ment. Figure 3 shows a ponion of the yeast 
metabolic pathways involved in carbon 
and energy metabolism. Mapping the 
changes we observed in the mRNAs en- 
coding each eruyme onto this framework 
allowed us to infer the redirection in the 
flow of metabolites through this system. 
We observed large inductions of the genes 
coding for the enzymes aldehyde dehydro- 
genase (ALD2) and acetyl-coeruyme 
A(CoA) synthase (ACSl)y which func- 
tion together to convert the products of 
alcohol dehydrogenase into acetyl-CoA, 
which in turn is used to fuel the tricarbox- 
ylic acid (TCA) cycle and the glyoxylate 
cycle. The concomitant shutdown of tran- 
scription of the genes encoding pyruvate 
decarboxylase and induction of pyruvate 
carboxylase rechanneb pyruvate away 
from acetaldehyde, and instead to oxalac- 
etatc, where it can serve to supply the 
TCA cycle and gluconeogenesis. Induc- 
tion of the pivotal genes PCKl, encoding 
phosphoenolpyruvate carboxykinase, and 
FBP/, encoding fructose 1,6-biphos- 
phatase, switches the directions of two key 
irreversible steps in glycolysis^ reversing 
the flow of metabolites along the revers- 
ible steps of the glycolytic pathway toward 
the essential biosynthetic precursor, glu- 
cose -6-phosphate. Induction of the genes 
coding for the trehalose synthase and gly- 
cogen synthase complexes promotes chan- 
neling of glucose-6-phosphate into these 
carbohydrate storage pathways. 

Just as the changes in expression of 
genes encoding pivotal enzymes can pro- 
vide insight into metabolic reprogram- 
ming, the behavior of large groups of func- 
tionally related genes can provide a broad 
view of the systematic way in which the 
yeast cell adapts to a changing environ- 
ment (Fig. 4). Several classes of genes, 
such as cytochrome c*relaced genes and 
those involved in the TCA/glyoxylate cy- 
cle and carbohydrate storage, were coordi- 
nate ly induced by glucose exhaustion. In 
contrast, genes devoted to protein synthe- 
sis, including ribosomal proteins, tRNA 
synthetases, and trarulation, elongation, 
and initiation factors, exhibited a coordi- 
nated decrease in expression. More than 
95% of ribosomal genes showed at least 
twofold decreases in expression during the 
diauxic shift (Fig. 4) (13), A noteworthy 
and illuminating exception was that the 



genes encoding mitochondrial ribosomal 
genes were generally induced rather than 
repressed after glucose limitation, high- 
lighting the requirement for mitchondrial 
biogenesis (/3). As more is learned about 
the functions of every gene in the yeast 
genome, the ability to gain insight into a 
cell's response to a changing environment 
through its global gene expression patterns 
will become increasingly powerful. 

Several distinct temporal panems of ex- 
pression could be recognized, and sets of 
genes could be grouped on the basis of the 
similarities in their expression panems. The 
characterized members of each of these 
groups also shared important similarities in 
their functions. Moreover, in most cases, 
common regulatory mechanisms could be 
inferred for sees of genes with similar expres- 
sion profiles. For example, seven genes 
showed a late induction profile, with mRNA 
levels increasing by more than ninefold at 
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the last timcpoint but less than threefold at 
the preceding rimepoint (Rg. 5B). All of 
these genes were kn wn to be glucose-te- 
prcssed, and five of the seven were previously 
noted to share a conunon iQ)stTeain activat- 
ing sequence (UAS). the carbon source re- 
sponse element (CSRE) (16^20). A search 
in the promoter regions of the remaining two 
genes, ACRl and /DP2. revealed that 
ACRJ, a gene essential for ACS J activity, 
also possessed a consensus CSRE nwtif, but 
interestingly, IDP2 did not. A search of the 
entire yeast genome sequence for the con- 
sensus CSRE motif revealed only four addi- 
tional candidate genes, none of which 
showed a similar induction. 

Examples from additional groups of 
genes that shared expression profiles are 
illustrated in Fig. 5, C through F. The 
sequences upstream of the named genes in 
Fig. 5C all contain stress response ele- 
ments (STRE), and with the exception 




Fig. 1. Yeast genome microarray. The actual size of the microarray is 18 mm by 18 mm The 
microarray was printed as described (9). This image was obtair>ed with the same fluorescent 
scannrng confocal microscope used to collect all the data we report (49). A fluorescentfy labeled 
cDNA probe was prepared from mRNA isolated from cells harvested shortly after inoculation (culture 
density of <6 x lO^ cells/ml and media glucose level of 19 gAtier) by reverse transcription in the 
presence of CyS-dLTTP. Similarly, a second probe was prepared from mRNA isolated from cells taken 
from the same culture 9.5 hours later (culture density of -2 x 10^ cells/ml. with a glucose level of 
n \; i r!^ . ^ transcription in the presence of CyS-dUTP. In this image, hybridization of the 

Cy3-daTP-labeled cDNA (that is, mRNA expression at the initial timepoint) is represented as a oreen 
signal, and hybridization of CyS-dUTP-labeled cDNA (that is, mRNA expression at 9,5 hot^) is 
represented as a red signal. Thus, genes induced or repressed after the diauxic shift appear in this 
•mage as red and green spots. respectiveV. Genes expressed at roughly equal levels before and after 
the diauxic shift appear in this image as yellow spots. 
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of HSP42, have prcvi usly been sh wn t 
be contr lied at least in pan by these 
elements {2/-24). Inspection f the se- 
quences upstream of HSP42 and the two 
uncharacteriied genes shown in Fig. 5C, 
YKL026c, a hypothetical protein with 
similarity to glutathione peroxidase, and 
YGRCM3c, a putative transaldolase, re- 
vealed that each of these genes also pos- 
sess repeated upstream copies of the stress- 
responsive CCiCCT motif. Of the 13 ad- 
ditioT\al genes in the yeast genome that 
shared this expression profile [including 
HSP30, ALD2. and 10 uncharac- 

teriied ORFs (25)], nine contained one or 
more recognizable b J RE sites in their up- 
stream regions. 

TTiC heterotrimeric transcriptional acti- 
vator complex HAP2,3A has been shown 
to be responsible for induction of several 
genes important for respiration (26-28), 
This complex binds a degenerate consensus 
sequence known as the CCAAT box (26). 
Computer analysis, using the consensus se- 
quence TNRYTGGB (29), has suggested 
that a large number of genes involved in 
respiration may be specific targets of 
HAP2,3,4 (30). Indeed, a putative 
HAP2,3A binding site could be found in 
the sequences upstream of each of the seven 
cytochrome c-related genes that showed 
the greatest magnitude of induction (Fig. 
5 D). Of 12 additional cytochrome c-related 
genes that were induced, HAP2J,4 binding 
sites were present in all but one. Signifi- 
cantly, we found that transcription of 
HAP4 itself was induced nearly ninefold 
concomitant with the diauxic shift. 

Control of ribosomal protein biogenesis 
is mainly exerted at the transcriptional 
level, through the presence of a common 
upstream-activating element (UAS^) 
that is recognized by the Rapl DNA-bihd- 
ing protein (31, 32). The expression pro- 
files of seven ribosomal proteins are shown 
in Fig. 5F. A search of the sequences 
upstream of all seven genes revealed con- 
sensus Rapl -binding motifs (33). It has 
been suggested that declining Rapl levels 
in the cell during starvation may be re- 
sponsible for the decline in ribosomal pro- 
tein gene expression {34). Indeed, we ob- 
served that the abundance of RAP I 
mRNA diminished by 4.4-fold, at about 
the time of glucose exhaustion. 

Of the 149 genes that encode known or 
putative transcription factors, only two, 
HAP4 and S/P4, were induced by a factor of 
more than threefold at the diauxic shift. 
SIP4 encodes a DNA-binding transcrip- 
tional activator that has been shown to 
interact with Snfl, the "master regulator" of 
glucose repression (35). The eightfold in- 
duction of S/P4 upon depletion of glucose 
strongly suggests a role in the induction of 



downstream genes at the diauxic shift. 

Although most of the transcriptional 
responses that we observed were not pre- 
viously known, the responses of many 
genes during the diauxic shift have been 
described. Comparison of the results we 
obtained by DNA microarray hybridiza- 
tion with previously reported results there- 
fore provided a strong test of the sensitiv- 
ity and accuracy of this approach. The 
expression patterns we observed for previ- 
ously characterized genes showed almost 
perfect concordance with previously pub- 
lished results (36). Moreover, the differ- 
ential expression measurements obtained 
by DNA microanay hybridization were re- 
producible in duplicate experiments. For 
example, the remarkable changes in gene 
expression between celb harvested imme- 
diately after inoculation and immediately 
after the diauxic shift (the first and sixth 
intervals in this time series) were mea- 
sured in duplicate, independent DNA mi- 
croarray hybridizations. The correlation 
coefficient for two complete sets of expres- 
sion ratio measurements was 0.87, and for 
more than 95% of the genes, the expres- 



sion ratios measured in these duplicate 
experiments differed by less than a factor 
of 2. However, in a. few cases, there were 
discrepancies between our results aid pre- 
vious results, pointing to technical limita- 
tions that will need to be addressed as 
DNA microarray technology advances 
(37, 38). Despite the noted exceptions, 
the high concordance between the results 
we obtained in these experiments and 
those of previous studies provides confi- 
dence in the reliability and thoroughness 
of the survey. 

The changes in gene expression during 
this diauxic shift are complex and involve 
integration of many kinds of information 
about the nutritional and metabolic state 
of the cell. The large number of genes 
whose expression is altered and the diver- 
sity of temporal expression profiles ob- 
served in this experiment highlight the 
challenge of understanding the underlying 
regulatory mechanisms. One approach to 
defining the contributior« of individual 
regulatory genes to a complex program of 
this kind is to use DNA microarrays to 
identify genes whose expression is affected 



Fig. 2. The section of the ar- 
ray indicated by the gray box 
in Fig. 1 is shown for each of 
the experimems described 
here. Representative genes 
are labeled. In each of the ar- 
rays used to analyze gene 
expression during the diauxic 
shift, red spots represent 
genes that were induced rel- 
ative to the initial timepoint, 
arxj green spots represent 
genes that were repressed 
relative to the initial timepoint. 
In the arrays used to anaiyze 
the effects of the fup 7 A mu- 
tation and YAPl overexpres- 
sion. red spots represent 
genes whose expression was 
increased, and green spots 
represent genes whose ex- 
pression was decreased by 
the genetic modification. Note 
that distinct sets of genes are 
induced and repressed in the 
different experiments. The 
connplete images of each of 
these arrays can be viewed on 
the Internet (73). Cell density 
as measured by optical densi- 
ty (00) at 600 nm was used to 
nr»easure the growth of the 
culture. 
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by mutations in each putative regulatory 
gene. As a test of this strategy, we analyzed 
the genomewide changes in gene expression 
that result from deletion f dKc TUPl gene. 
Transcriptional repression of many genes by 
glucose requires the DNA-binding repressor 



Migl and is mediated by recruiting the tran- 
scriptional co-rcpressors Tupl and Cyc8/ 
Ssn6 (39). Tupl has also been implicated in 
repression of oxygen-regulated, matirig-type- 
specific, and DNA-damage-induciblc genes 
(40). 
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Rg. 3. Metatx)I»c reprogramming inferred from globa) analysis of changes in gene expression Only key 
metabolic intemDediates are (dentified. The yeast genes encoding the enzymes that catalyze each step 
in this metabofc circuit are identified by name in the boxes. The genes encoding succinyl-CoA synthase 
and glycogen-debranching enzyme have not been explicitly identified, but the ORFs YGR244 and 
YPRie4 show significant homology to known succinyl-CoA synthase and gtycogen-debranching en- 
zymes, respectfvely. and are therefore included in the corresponding steps in this figure Red boxes with 
white lettenng identify genes whose expression inaeases in the diauxic shift. Green boxes with dark 
green lettering identity genes whose expression diminishes in the diauxic shift. The magnitude of 
induction or repression is indicated for these genes. For muttimeric enzyme complexes such as 
sucanate dehydrogenase, the indicated fold-induction represents an unweighted average of all the 
genes listed in the box. Black and white boxes indicate no significant differential expression (less than 
twofold). The direction of the arrows connecting reversible enzymatic steps indicate the direction of the 
flow of metabolic intermediates, inferred from the gene expression pattern, after the diauxic shift Arrows 
representing steps catalyzed by genes whose expression was strongly induced are highlighted in red 
The broad gray arrows represent major increases in the flow of metabolites after the diauxic shift 
inferred from the indicated changes in ger>e expression. 



Wild-type yeast cells and cells bearing 
a deletion f the TUPJ gene (tupl A) were 
grown in parallel cultures in rich medium 
containing glucose as the carix)n source. 
Messenger RNA was isolated from expo- 
nentially growing cells from the two pop- 
ulations and used to prepare cDNA la- 
beled with Cy3 (green) and Cy5 (red), 
respectively ( J J ). The labeled probes were 
mixed and simultaneously hybridized to 
the microarray. Red spots on the microar- 
ray therefore represented genes whose 
transcription was induced in the tup I A 
strain, and thus presumably repressed by 
Tupl (41), A representative section of the 
microarray (Fig. 2. bottom middle panel) 
illustrates that the genes whose expression 
was affected by the tuplA mutation, were, 
in general, distinct from those induced 
upon glucose exhaustion [complete images 
of all the aoays shown in Fig. 2 are avail- 
able on the Internet (13)1 Nevertheless, 
34 (10%) of the genes that were induced 
by a factor of at least 2 after the diauxic 
shift were similarly induced by deletion of 
TUPl . suggesting that these genes may be 
subject to TUPi-mcdiated repression by 
glucose. For example, SUC2, the gene en- 
coding invertase. and all five hexose trans- 
porter genes that were induced during the 
course of the diauxic shift were similarly 
induced, in duplicate experiments, by the 
deletion of TUP i. 

The set of genes affected by Tupl in this 
experiment also included a-glucosidases, 
the mating-type-rspecific genes MFAl and 
MFA2, and the DNA damage-inducible 
RNR2 and ilNR4. as well as genes involved 
in flocculation and many genes of unknown 
function. The hybridization signal corre- 
sponding to expression of TTJPi itself was 
also severely reduced because of the (in- 
complete) deletion of the transcription unit 
in the tuplA strain, providing a positive 
control in the experiment (42). 

Many of the transcriptiorwl targets of 
Tupl fell into sets of genes with related 
biochemical functions. For instance, al- 
though only about 3% of all yeast genes 
appeared to be TUP /-repressed by a factor 
of more than 2 in duplicate experiments 
under these conditions, 6 of the 13 genes 
that have been implicated in flocculation 
(15) showed a reproducible increase in 
expression of at least twofold when TUPJ 
was deleted. Another group of related 
genes that appeared to be subject to TUPl 
repression encodes the serine-rich cell 
wall mannoproteins, such as Tipl and 
Tirl/Srpl which are induced by cold 
shock and other stresses (43), and similar, 
serine-poor proteins, the scripauperins 
(44), Messenger RNA levels for 23 of the 
26 genes in this group were rcproducibly 
elevated by at least 2.5.fold in the tuplA 
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strain, and 18 of these genes were induced 
by m re than sevenf Id when TUPI was 
dcieied. In contrast, none of 83 genes that 
couid be classified as putative regulat rs of 
tne cell division cycle were induced m re 
than tw f Id by deletion of TUPL Thus 
despite the diversity of the regulatory sys.' 
terns that employ Tupl , most of the genes 
that It regulates under these conditions 
rail int a limited number of distinct func- 
tional classes. 

Because the microarray allows us to 
monitor expression of nearly every gene in 
yeast, we can, in principle, use this ap. 
proach to identify all the transcriptional 
targets of a regulatory protein like Tupl. It 
is important to note, however, that in any 
single experiment of this kind wc can only 
recognize those target genes that are nor- 
mally repressed (or induced) under the 
conditions of the experiment. For in- 
stance, the experiment described here an- 
alyzed a MAT a strain in which MFAJ 
and MFA2, the genes encoding the a- 
factor mating pheromone precursor, are 
normally repressed. In the isogenic tupl A 
stram, these genes were inappropriately 
expressed, reflecting the role that Tupl 
plays in their repression. Had we instead 
carried out this experiment with a MATA 
strain (in which expression of MFAi and 
MFA2 is not repressed), it would not have 
been possible to conclude anything re- 
garding the role of Tupl in the repression 
of these genes. Conversely, we cannot dis- 
tinguish indirect effects of the chronic 
absence of Tupl in the mutant strain fi-om 
effects directly attriburable to its panici- 
pation in repressing the transcription of a 
gene. 

Another simple route to modulating the 
activity of a regulatory factor is to overex- 
press the gene that encodes it. YAP J en- 
codes a DNA-binding transcription factor 
belonging to the b-zip class of DNA-bind- 
ing proteins. Oerexpression of YAPl in 
yeast confers increased resistance to hydro- 
gen peroxide, o-phenanthroline. heavy 
metals, and osmotic stress (45). We ana- 
lyzed differential gene expression between a 
wild-type strain bearing a control plasmid 
and a strain with a plasmid expressing YAP I 
under the control of the strong GALl-lO 
promoter, both grown in galactose (that is 
a condition that induces YAPl overexpres-* 
sion). Complementary DNA f^om the con- 
trol and VAPJ overexpressing strains, la- 
beled with Cy3 and Cy5, respectively, was 
prepared from mRNA isolated from the two 
strains and hybridized to the microarray. 
Thus, red spots on the anay represent genes 
that were induced in the strain overexpress- 
ing YAPL 

Of the 17 genes whose mRNA levels 
increased by more than threefold when 



was overexpressed in this way. five 
bear homology to aryl-alcoho! oxidoreduc- 

^ ?f' ^ "^^'''^ «<i^itional 
our of the gcries in this set also belong to 
Ac general c ass of dehydrogenases/oxi- 
doreductases. Very little is known about 
rfie role of aryl-alcohol oxidoreductases in 
cereinsuu, but these enzymes have been 
isolated from ligninolytic fungi, in which 
they participate in coupled redox reac- 
tions. oxidizing aromatic, and aliphatic 
uiisaturated alcohols to aldehydes with the 
production of hydrogen peroxide [46, 47) 
The fact that a remarkable fraction of the 
targets identified in this experiment be- 
long to the same small, functional group of 
oxidoreductases suggests that these genes 

Pig. 4. Coordinated reg- 
ulation of functionalty re- 
lated genes. The curves 
represent the average in- 
duction or repression ra- 
tios for ati the genes in 
each indicated group. 
The total numt)er of 
genes in each group was 
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ing sites upstream of the others may reflect 
an ability of Yapl to bind sites that differ 
from the canonical binding sites, perhaps in 
cooperation with other factors, or less like- 
ly, may represent an indirect effect of Yapl 
overexprcssion, mediated by one or more 
intermediary factors. Yapl sites were found 
niy four times in the corresponding region 
of an arbitrary set of 30 genes that were not 
differentially regulated by Yapl. 

Use of a DNA microarray to character- 
ize the transcriptional consequences of 
mutations affecting the activity of regula- 
tory molecules provides a simple and pow- 
erful approach to dissection and character- 
ization of regulatory pathways and net- 



works. This strategy also has an important 
practical application in drug screening. 
Mutations in specific genes encoding can- 
didate drug targets can serve as surrogates 
for the ideal chemical inhibitor or modu- 
lator of their activity. DNA microarravs 
can be used to define the resulting signa- 
ture pattern of alterations in gene exprcs- 
sion, and then subsequently used in an 
assay to screen for compounds that repro- 
duce the desired signature pattern. 

DNA microarrays provide a simple and 
economical way to explore gene expres- 
sion patterns on a genomic scale. The 
hurdles to extending this approach to any 
other organism are minor. The equipment 
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Fig. 5. Distinct temporal patterns of induction or repression help to group qenes that share rPnni;,torv 
properties. (A) Temporal profile of the cell density, as measured by OD at em nm ln^ 
concentration in the media. (B) Seven genes exhibit a stro? "rSuc^on^o^e^^ 
the test timepo^t(20,5hou.).V^^^ 

were no addrtional genes observed to match this profile. (C) Seven members of a cLss of genes maS^ 
by early indi^,on wrth a peak .n mRNA levels at 18.5 hours. Each of these genes cont^n STOE rn^^ 
repeats .n the.r upstream promoter regions. (D) Cytochrome c oxidase anf ubiquino So™ 
reductase genesj^arked by an .nduction coincident with the diauxic shift, each of these ge^^e^S^ 

expression profite. (E) SAMh GPPi, and several genes of unknown function are repressed before the 
d.aux.c shrft. and continue to be repressed upon entry into stationary phase. (FrRibosomaTnr^^ 
ger>es compose a large dass of genes that are repressed upon depletion of glucose. E^h oTtt g^^^^^ 
profited here con ans one or more RAPl -binding motifs upstream of its promoter. RA^Us a tSnsain 
tonal regulator of most ribosomal proteins. transcrrp 
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required for fabricating and using DNA 
microarrays (9) consists of components 
that were chosen for their modest cost and 
simplicity. It was feasible for a small group 
to accomplish the amplification of more 
than 6000 genes in about 4 months and. 
once the amplified gene sequences were in 
hand, only 2 days were required to print a 
set of 110 microarrays of 6400 elements 
each. Probe preparation, hybridization, 
and fluorescent imaging are also simple 
procedures. Even conceptually simple ex. 
periments. as we described here, can yield 
vast amounts of information. The value of 
the information from each experiment of 
this kind will progressively increase as 
more is learned about the functions of 
each gene and as additional experiments 
define the global changes in gene expres- 
sion in diverse other natural processes and 
genetic perturbations. Perhaps the greatest 
challenge now is to develop efficient 
methods for organizing, distributing, inter- 
preting, and extracting insights from the 
large volumes of data these experiments 
will provide. 
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We describe here a method for drug target validation and identification of secondary drug tar- 
get effecu based on genome-wide gene expression patterns. The method is demonstrated by 
several experimenu, including ueatment of yeast mutant strains defective in calcineurin, im- 
munophilins or other genes with the immunosuppressants cyclosporin A or FK506. Presence or 
absence of the characteristic drug ^signature* pattern of altered gene expression in drug-treated 
cells with a mutation in the gene encoding a putative target established whether that target was 
required to generate the drug signature. Drug dependent effects were seen in ^targetless' celts, 
showing that FK506 affects additional pathways independent of calcineurin and the im- 
munophilins. The described method permiu the direct confirmation of drug largeu and recog- 
nition of drug-dependent changes in gene expression that are modulated through pathways 
distinct from the drug's intended target. Such a method may prove useful in improving the effh 
ciency of drug development programs. 



Good drugs are potent and specific; that is. they must have 
strong effects on a specific biological pathway and minimal ef- 
fects on all other pathways. Confirmation that a compound in- 
hibits the intended target (drug target validation) and the 
identification of undesirable secondary effects are among the 
main challenges in developing new drugs. Comprehensive 
methods that enable researchers to determine which genes or 
activities are affected by a given drug might improve the effi- 
ciency of the drug discovery process by quickly identifying po- 
tential protein targets, or by accelerating the identification of 
compounds likely to be toxic. DNA microarray technology, 
which permits simultaneous measurement of the expression 
levels of thousands of genes, provides a comprehensive frame- 
work to determine how a compound affects cellular metabolism 
and regulation on a genomic scale' ". DNA microarrays that 
contain essentially every open reading frame (ORF) in the 
Saccharomyces cerevis/ae genome have already been used success- 
fully to explore the changes in gene expression that accompany 
large changes in cellular metabolism or cell cycle progression 

In the modem drug discovery paradigm, which typically be- 
gins with the selection of a single molecular target, the ideal in- 
hibitory drug is one that inhibits a single gene product so 
completely and so specifically that it is as if the gene product 
were absent. Treating cells with such a drug should induce 
changes in gene expression very similar to those resulting from 
deleting the gene encoding the drug s target. Here we have com- 
pared the genome- wide effects on gene expression that result 
from deletions of various genes in the budding yeast 5. cerevisiae 
to the effects on gene expression that result from treatment 
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with known inhibitors of those gene products. Using the cal- 
cineurin signaling pathway as a model system, we tested an ap- 
proach that permits idenUfication of genes that encode proteins 
specifically involved in pathways affected by a drug. The FK506 
characteristic pattern, or signature*, of altered gene expression 
was not observed in mutant cells lacking proteins inhibited by 
FK506 (for example, a calcineurin or FK506-binding-proteln 
mutant strain), but was observed in mutants deleted for genes 
in pathways unrelated to FK506 action (for example, a cy- 
clophilin mutant strain). Conversely, the cyclosporin A (CsA) 
signature was not observed in CsA-treated calcineurin or cy- 
clophilin mutant strains, but was seen in an FK506-binding-pro- 
tein mutant strain treated with CsA. The method also 
demonstrates that FK506. a clinically used immunosuppressant, 
has off-target' effects that are independent of its binding to im- 
munophilins. Thus, the approach we describe may provide a 
way to identify the pathways altered by a drug and to detect 
drug effects mediated through unintended targets. 

Null mutants phenocopy drug-treated cells on a genomic scale 
To test whether a null mutation in a drug target serves as a 
model of an ideal inhibitory drug, we examined the effects on 
gene expression associated with pharmacological or genetic in- 
hibition of calcineurin function. Calcineurin is a highly con- 
served calcium- and calmodulin-aciivated serine/threonine 
protein phosphatase implicated in diverse processes dependent 
on calcium signaling'^*'^ In budding yeast, calcineurin is re- 
quired for intracellular ion homeostasis'\ for adaptation to pro- 
longed mating pheromone treatment'* and in the regulation of 
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Fig. 1 Mode) of antagonism of the calcneurin signaling pathway mediated 
by FKSOe and cyclosporin A (CsA). Caldneurtn activity is composed of a cat- 
alyiic sutmnrt (calcineurin A, encoded in yeast by the CAM 7 and CNA2 genes) . 
and cateiunvbinding regulatory subunits calmodulin (CMD) and calcineurki B 
(CnB). After entering ceBs, FK506 and CsA spedficalty bind and inhibrt the 
peptidyl-proline isomerase activity of their respective immunopWIins. FKSOe 
binding proteins (FKBP) and cydophilins (CyP). The most abundant im- 
nwnophilins in yeast (Fprl and Cphi) are thought to mediate calcineurin in- 
hibition. Dnjg-immunophmn complexes bind and \n!hm the calcium- and 
calmodulin-stimulated phosphatase calcineurin. Among the substrates of cat- 
dneurin are transcriptional activates that act to modulate gene expression. 



the onset of mitosis". In mammals, calcineurin has been impli- 
cated in T-cell activation", in apopiosis". in cardiac hypertro- 
phy" and in the transition from short-term to long-term 
memory". In both organisms, calcineurin activity is inhibited 
by FK506 and CsA. immunosuppressant drugs whose effects on 
calcineurin are mediated through families of intracellular recep- 
tor proteins called immunophilins*"° (Fig. l). To disiess the ef- 
fects of pharmacologic inhibition of calcineurin. wild-type S. 
^ cerevisiae was grown to early logarithmic phase in the presence 
i or absence of FK506 or CsA. Isogenic cells, from which the 
j genes encoding the catalytic subunits of calcineurin {CNAl and 
I CNA2) had been deleted'* (refen-ed to as the cna or calcineurin 
I mutant), were grown in parallel, in the absence of the drug. 
Fluorescenily-labeled cDNA was prepared by reverse transcrip- 
tion of polyA- RNA in the presence of Cy3- or Cy5-deoxynu- 
cleotide triphosphates and then hybridized to a microarray 
containing more than 6,000 DNA probes representing 97% of 
the known or predicted ORFs in the yeast genome. 
Simultaneous hybridization of Cy5-labeled cDNA from mock- 
treated ceUs and Cy3-labeled cDNA from cells treated with 1 
Mg/ml FK506 allowed the effect of drug treatment on mRNA lev- 
els of each ORF to be determined (Fig. 2a and 6 and data not 
shown). Similarly, effects of the calcineurin mutations on the 
mRNA levels of each gene were assessed by simultaneous hy- 
bridizaUon of Cy5-la baled cDNA from wild-type cells and Cy3- 
labeled cDNA from the calcineurin mutant strain (Fig. 2c). For 
each comparison of this kind, reported expression ratios are the 
average of at least two hybridizations in which the Cy3 and Cy5 
fluors were reversed to remove biases that may be introduced by 
gene-specific differences in incorporaUon of the two fluors 
(data not shown). 

Treatment with FK506 in these growth conditions resulted in 
a signature pattern of altered gene expression in which mRNA 
levels of 36 ORFs changed by more than twofold 
(http://www.rosetta.org). A very similar pattern of altered gene 
expression was observed when the calcineurin mutant strain 
was compared to wild-type ceUs. Comparison of the changes in 
mRNA expression of each gene resulting from treatment of 
wild-type cells with FK506 with mRNA expression changes re- 
sulting from deletion of the calcineurin genes showed the con- 
siderable similarity of the global transcript alterations in 
response to the two penurbailons (Fig. 26^. QuantincaUon of 
this similarity using the con-elaUon coefficient (p) showed 
large con-elatlons between the FK506 treatment signature and 
the calcineurin deletion signature (p « 0.75 ± 0.03). as well as 
the CsA treatment signature (p « 0.94±0.02). but not with a 
randomly selected deleUon mutant strain (deleted for the 
YEHOTIC gene: p - -0.07 ± 0.04: Fig. 2e). The FK506 treatment 
signature was also compared with those of more than 40 other 
deletion mutant strains or drug-treatments thought to affect 




unrelated pathways, and none had statisUcally signifjcant cor- 
relations. These data establish that genetic disruption of cal- 
cmeurin function provides a close and specific phenocopy of 
treatment with FK506 or CsA. 

To avoid generalizing from a single example, we also com- 
pared the effects of treatment of wild-type cells with 3-aminotri- 
azole (3-AT) with the effects of deleUon of the H1S3 gene HIS3 
encodes imidazoleglycerol phosphate dehydratase, which cat- 
alyzes the seventh step of the hlsUdine biosynthetic pathway in 
yeast": 3-AT is a competitive inhibitor of this enzyme that trig- 
gers a large u-anscriptional amino-add starvaUon response" 
Microarray analysis of wlld-type and isogenic Ais3.deficient 
strains demonstrated the expected large genome-wide transcrip- 
tional responses (involving more than 1.000 ORFs) resuIUng 
from treatment with S-AT (Fig. 3a) or from H1S3 deletion (Fig 
3c). Quantitative comparison of the 3-AT treatment signature 
and the his3 mutant signature showed a high level of correlation 
(P= 0.76 ± 0.02) that even extended to genes that experienced 
small changes in expression level (Fig. 36). As a negative control 
the correlations between the 3-AT treatment signature or the 
hjs3 mutant signature and the calcineurin mutant strain were 
not statistically significant (p « 0.09 ± 0.06 and -0.01 ± 0.04 re- 
spectively). That both the calcineurin/FK506 and the his3/Z- AT 
comparisons were highly correlated indicates that in many cases 
the expression profile resulting from a gene deletion closely re- 
sembles the expression profile of wild-type cells treated with an 
inhibitor of that gene's product. 
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^Decoder' strategy: Drug target validation with deletion mutants 
Because pharmacological inhibition of different targets might 
give similar or identical expression profiles, simple comparison 
of drug signatures to mutant signatures is unlikely to unambigu- 
ously identify a drug s target. To overcome this limitation, an 
additional decoder* step is used. We first compare the expres- 
sion profile of wild-type drug-treated cells to the expression pro- 
files from a panel of genetic mutant strains, using a correlation 
coefficient metric. Mutant strains whose expression profile is 
similar to that of drug-treated wild-type cells are selected and 
subjected to drug treatment, generating the drug signature in 
the mutant strain (that is. the mutant drug signature). If the 
mutated gene encodes a protein involved in a pathway affected 
by the drug, we expect the drug signature in mutant cells to be 
different (or absent, for an ideal drug) from tiie drug signature 
seen in wild-type cells. 
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Fig. 2 Expression proHles from 
FKSOe-uested witd-type (wt) 
cells and a calcineurin-dtsruption 
mutant strain share a genome- 
wide correlation. DNA microarray 
analysts showing changes in gene 
expression resulting from FK506 
ueaiment {a and b) or from ge- 
netic disruption of genes encod- 
ing calcineurin (c). j« Pseudo- 
color image of the results of si- 
muluneous hybridization of Cy5- 
lat>eled cONA (red) from 
mock-treated strain R563 and Cy3-labeled cDNA 
(green) from strain RS63 ueated with 1 jig/ml FK506. 

Enlarged view of the boxed area in a. Arrowheads in- 
dicate speciFic ORFs induced or repressed, c. Pseudo- 
color image of the results of simultaneous hybridization 
of CyS-labeled cONA (red) from strain R563 and Cy3- 
lat>eled cDNA (green) from suain MCY300 (deleted for 
the CNA1,CNA2 catalytic subunits of calcineurin). 
Arrows indicate specific ORFs induced or repressed, d. 
The log,o of the expression ratio for each ORF derived 
from the FK506 treatment hybridizations is plotted ver- 
sus the logto of the expression ratio in the calcineurin 
mutant hybridizations. ORFs that were induced or re- 
pressed in both experiments are shown as green and 
red dots, respectively. «, The log,o of the expression ratio for each ORF de- 
rived from the FK506 Ireatmeni hybridizations is plotted versus the log„ 



wi 1 iig/m\ FK506 



wt vs. calcinerurin nujum 
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Log^ (R/G) calcineurin mutation 



Log» (R/G) yerOru muiatioo 



of the expression ratio in the yer£777c mutant hybridizations. No ORFs 
were induced or repressed in both experiments. 
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To illustrate this, we treated the hisS mutant strain with 3- 
AT. The signature pattern of altered gene expression resulting 
from treatment of the mutant strain with 3-AT was much less 
complex than that of the 3-AT signature in wild-type cells (Fig. 
4). This is seen simply by examining plots of mean intensity of 
the hybridization signal (which approximately reflects level of 
expression) versus the expression ratio for each ORF (Fig. 4). 
Genes that were expressed at higher or lower levels in 3-AT 
treated cells or in his3 mutant cells are shown as red and green 
dots, respectively. We analyzed the 3-AT signature in wild-type 
(Fig. 4a) and /ii53 mutant cells (Fig, 4c), as well as the hi$3 mu- 
tant strain signature (Fig, 4b), Whereas hisiidine limitation in- 
duced by 3-AT induced more than 1.000 transcription-level 
changes in the wild-type strain, few or no transcript level 
changes were induced by treatment of the /i/s3-deletion strain 
with 3-AT. This indicates that with the growth conditions used, 
essentially all of the effects of 3-AT depend on or are mediated 
through the HIS3 gene product. 

Applying this approach to the calcineurin signaling pathway 
showed the specificity of the method. The calcineurin mutant 
strain and strains with deletions in the genes encoding the 
most abundant immunophillns in yeast {CPHl and FPRl) 
were treated with either FK506 or CsA to determine the profiles 



Table 1 



Signature correlation of expression ratios as a result of FK506 
treatment in various mutant strains 



wild-type 
*/"FK506 



cna 
*/-fK506 



wild-type 
4/- FK506 



fpri 
1/-FK506 



cna fpri 
4/-FK506 



0.93 10.04 -0.01 ±0.07 -0.23 ± 0.07 0.12 t 0.07 0.79 i 003 



Signature conetalion shows the absence of the FK506 signature specifically in (he calcineurin {cna) eno fpn 
(major FK506 binding protem) deletion muianu. cna represenis the mutant with deletions of the catalytic sub- 
units of catcirwurin. CNA1 arxl CNA2. The correlation coefficiem reported in the first column represents the cor. 
relation between two pairs of hybridi^atkyu from independent wild-type FK506 experimenu 



of altered gene expression resulting from drug treatment of the 
mutant cells (that is. mutant +/- drug). We compared the drug 
signatures in the mutants to the wild-type drug signature using 
the correlation coefficient metric (Table 1). Although the signa- 
ture generated by treatment of wild-type cells with FK506 was 
highly correlated to the calcineurin mutant strain signature (p 
« 0.75 ± 0.03). it bore no similarity to the profile after treat- 
ment of the calcineurin mutant strain with FK506 (p « .0.01 ± 
0.07). This indicates that FK506 was unable to elicit its normal 
transcriptional response in the calcineurin mutant strain. 
Likewise, treatment of the fprJ mutant strain with FK506 
elicited an expression profile that was not con-elated to the 
FK506 signature in the wild-type strain (p - -0.23 ± 0.07). indi- 
cating that the FPRI gene product is likely to be involved In the ' 
pathway affected by FK506. The same was true for the cna fpri 
mutant strain. In contrast, treatment of the cphl mutant strain 
with FK506 generated an expression profile highly correlated 
with the wild-type FK506 expression profile (p « 0.79 ± 0.03). 
indicating the cphJ mutation did not block the mode of action 
of FK506 and thus is not directly involved in the pathway af- 
fected by FK506. We tabulated the change in expression in re- 
sponse to FK506 in different mutant strains for all ORFs with 
expression ratios greater than 1.8 in FK506-ireated cells or in 
the calcineurin mutant strain (Fig. 5a).The 
calcineurin mutant strain signature and the 
FK506 responses in wild-type and the cphl 
mutant strain are similar, and there are no 
transcripi-level changes (seen in black) for 
treatment of the calcineurin. fpri and cna 
fpri mutant strains with FK506 (Fig. 5a). 

Similar experiments and analyses with CsA 
provided funher validation of this approach. 
The expression profile elicited by treatment 
of wild-type cells with CsA was highly corre- 
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Fig. 3 Expression proHles 
from a his3 mutant su-ain 
and wild-type (wl) cells 
Ueated with 3-AT share a 
genome-wide con-elation. 
DNA microarray analysis 
showing changes in gene 
expression resulting from 3- 
AT ueatment (a) or from ge- 
netic disruption of the HtS3 
gene (c). Pseudo-color 
inuge of the results of simul- 
taneous hytH-idization of 

Cy$-labeled cDNA (red) from mock-ueated wild-type strain R491 and 
Cy3-labeled cDNA (green) from strain R491 treated with 10 mM 3-AT. 

Plot of the l09« of the expression raUo for each ORF derived from the 
3-AT treatment hytnidizations is plotted versus the log,e of the expression 
ratio in the hi53 mutant hybridizations. ORFs that were induced or re- 
pressed in both experiments are shown as green and red dots, respec- 
tively. The correlation of expression ratios applies not only to genes with 
large expression ratios (for example^ CHA 1 and AflGI), but also extends to 
genes with expression ratios less man 2 (for example, UVl and CPH1). 
fm is induced 1.9-fold and 1.5-fold, and CPW7 isdownregulated 1.9-fold 




wtvL/iUmuatian 



Log,e (R/G) his3 muution 



and 1 ,7.fold. in celts ueated with 3-AT and his3 mutant cells, rcspeaively. 
Two ORFs do not fall on the line x « y. The leftmost point b the HIS3 data 
point, which is induced by 3-AT ueatment but which te not absent from 
the his3 mutant strain. The other point is YOR203w, Both data poinu are 
labeled HIS3 because hybridization to KOff203w is most likely due to HiS3 
mRNA, as YOR203w overlaps the HiS3 open reading frame, a. Pseudo- 
color image of the results of simultaneous hybridization of CyS-labeled 
cDNA (red) from wild-type suain R491 and Cy3.labeled cDNA (green) 
from strain R1226, deleted for the HIS3 gene. Arrowheads indicate spe- 
cific ORFs induced or repressed. 
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lated to the profile elicited by mutation of the calcineurin genes 
(p - 0.71 ± 0.04). but did not correlate vnth the expression pro- 
file resulting from treatment of the calcineurin mutant strain 
with CsA (p - -0.05 ± 0.07; Table 2). indicating that the genetic 
deletion of calcineurin interfered with the ability of CsA to 
elicit its normal transcriptional response. Likewise, the CsA sig- 
nature was essentially absent in CsA-ireated cphJ mutant cells, 
and the expression profile of CsA-treated cphJ mutant cells cor- 
related poorly to that of CsA-treated wild-type cells (p = 0.18 ± 
0.07). Thus, the CPH] gene product was required for the CsA re- 
sponse seen in wild-type cells. Conversely, treatment of fprJ 
mutant cells with CsA resulted in an expression pattern very 
similar to the profile of CsA-treated wild-type cells (p = 0.77 t 
0.03). indicating that FPPJ was not necessary for the CsA-medi- 
aied effecu. Analysis of individual ORFs affected by CsA and 
their expression ratios over the entire set of experiments con- 
firmed that CPHJ and the genes encoding calcineurin. but not 



FPRl, are necessary for the wild- type CsA response (Fig. 56). The 
observation that the profiles resulting from FK506 or CsA drug 
treatment are similar to that of the calcineurin deletion mutant 
strain might allow the prediction that calcineurin was involved 
in the pathway affected by these drugs. But because the expres- 
sion pronie of the fprJ mutant strain did not bear a strong simi- 
larity to the wild-type drug expression profile for FK506. it is. 
obvious that the drug treatment of the mutant strains was nec- 
essary to identify Fprl. but not Cphl. as a potential FK506 drug 
target. In the same way. the decoder' strategy was necessary to 
identify Cphl. but not Fprl, as a potential drug target for CsA. 

'Decoder' approach can identify secondary drug effects 
For a drug that has a single biochemical target, the strategy out- 
lined above may be useful in target validation. In many cases, 
however, a compound may affect multiple pathways and elicit 
a very complex signature. Decoding* such a complex signature 
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Fig. 4 Treatment of the his3 mutant strain with 3.AT shows nearly com- 
plete loss of 3-AT signature. A plot of the log„ of the mean intensity of hy- 
bridization for each ORF versus the log„ of its expression ratio for each 
experiment is shown next to a pseudo-color image of a representative 
portion of the microarray. ORFs that are induced or repressed at the 95% 
confidence level are shown in green and red. respectively, m. Expression 
profile from treatment of the wild-type (wt) strain with 3-AT. CyS-labeled 
cDNA (red) from mock-ueated strain RABI and CyB-labeled cDNA 
(green) from sUain R491 treated with 10 mM S-AT. b. Expression profile 



Log,o (intensity) 



from the hisS deletion strain. CyS-labeled cDNA (red) from suain R491 
and CyS-labeled cDNA (green) from suain Rl 226. deleted for the HiS3 
gene. «. Expression profile of treatment of the n/s3 deletion strain with 3- 
AT. Cy3-labeled cDNA (red) from rt/sJ-deleied suain R1226 and CyS-la- 
beled cDNA (green) from strain R1226 treated with 10 mM S-AT. 
Arrowheads indicate the DNA probe and data point corresponding to the 
HIS3 gene. The blue dashed line represents the threshold below which er- 
rors tend to increase rapidly because spot intensities are not sufficiently 
above background intensity. 
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Table 2 



Signature correlation of expression ratios as a result of CsA 
treatment in various mutant strains 



wild-type 
♦/-CsA 



wild- type 
*/-CsA 



ma 
♦/-CsA 



fpti 
♦/-CsA 



cnacphi 
♦/-C$A 



0.9<«0.04 -0.05 1.07 0.77 ±0.03 -0.11 » 0.07 



cphi 
♦/-CsA 

0.18 ± 0.07 



Q Strain: 
FKS06: 



Sgnanire carrelaiion ihows the absence of the CjA lionnun iomUiuii. i. ... ~7 

emeurin. C«>l » end CNA2. The eorrelnkm coefficient rZn^ in " ' ^* "'• 

b«ween.^p.U,,fhy.KMI„tK.mfron..nOep::::,r::S.^!y^ 



Into the effecu mediated through the Intended target (the on- 
target signature') and those mediated through unintended tar- 
gets (the -off-targef signature) might be useful in evaluating a 
compound's specificity. Our decoder" strategy is based on the 
premise that off-targef signature should be insensitive to the 
genetic disruption of the primary target. 

To determine whether the decoder" approach could identify 
an -off-targef profile, we looked for a drug-responsive gene 
whose expression is insensitive to deletion of the primary tar- 
g get. To increase the likelihood of observing such genes the 
same strains described in Tables 1 and 2 were treated with 
higher concentrations (50 ng/ml) of FK506. This led to a much 
more complex expression profile in wild-type cells, indicating 
that at this higher concentration. FK506 was inhibiting or acti 
vatlng additional targets. Several of the ORFs in this expanded 
FK506-induced expression profile were not affected by the cal- 
g cineurin. cphl or fprl mutations, as drug u-eatment of these mu- 
. tant strains did not block their presence in the FK506 
o expression signature (Fig. 6). This indicates that FK506 was trig- 
gering changes in transcript leveU of many genes through path- 
ways independent of calcineurln. CPHi and FPKI. Many of the 
upregulated ORFs in the off-targef pathway were genes re- 
portwl to be regulated by the transcripUonal activator Gcn4 
(ref. 24). In some strains, a reporter gene under CCN4 control 
was induced in response to FK506 treatment". To determine 
whether GCN4 is involved in this pathway that is independent 
of calcineurln. CPHI and FPRl. we analyzed the effects of treat- 
mem with high-dose FK506 on global gene expression in a 
strain with a CCN4 deletion (Fig. 6). Of the 41 ORFs with cal- 
cmeurin-independent expression raUos greater than 4 32 were 
not induced in the gcn4 mutant, indicating that their induction 
by FK506 was CCW4-dependent. Not all CCyV4-regulated genes 
were induced by FK506. This FKSOe-induced subset of CCN4. 
regulated genes may be those most sensitive to subtle changes 
m Gcn4 levels, or perhaps other regulatory drcuiu prevent 
FK506 acuvation of some CCAT^-regulated genes. Seven of the 
remaining nine ORFs induced by FK506 were independent of 

Fig. 5 Response of FKS06 and CsA signature genes in sualns with deletions 
•n different genes. Genes with expression ratios greater than a faoor of 1 8 In 
;f'f .T." ^"'"^^ 1 »9""> FK506 (a) or 50Mg/ml CsA (/» are listed 
(left side) and their expresston ratios in the indicated suain are shown on the 
green (induetion)-red (repression) color scale, a. Cateineurin (cna) mutant 
and FK506 ueatmem signature genes are in the fra two columns. Almost all 
FK506 signature genes have expression ratios near unity in deleOon suains 
involved In pathways affected by FK506 (calcineurin, fprl and cna (pri mu 
tants) but not in deletion suains in unrelated pathways (epft J) t Cateineurin 
(cna) mutant and CsA treatment signature genes are in the first two 
columns. Almost all CsA signature genes have expression ratios near unity In 
deletion strains involved in pathways affected by CsA (calcineurin. cph 1 and 
cna cpM mutants) but not in deletion strains in unrelated pathways (/pr7). 
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both the calcineurin and CCN4 pathways. The 
simplest explanation is that FK506 inhibits or 
activates additional pathways. Members of this 
class include 5Af<?^ and PDRS. genes that en- 
code drug efflux pumps with structural homol- 
ogy to mammalian mulUple drug resistance 
proteins". FK506 may interact direcUy with 
PdrS to inhibit its function". Our results Indi- 
cate that treatment with FKS06 leads to four- 
fold-to-sixfold induction of fDWmRNA levels. 
YORl, another gene that can confer drug resls- 
cve«<, ^ '"^"ced threefold-to fourfold by 

FKS06. Thus, drug treatment of strains with mutations in the 
primary targets can prove useful in identifying effects mediated 
by secondary drug targets, including the nature and extent of 

r'TL J'"!'*'' ^""^ previously unsuspected pathways af- 

fected by the drug. 

We describe here a method for drug target validation and the 
Identification of secondary drug target effects that uses DNA mi- 
croarrays to survey the effects of drugs on global gene expres- 
sion patterns. We established that genetic and pharmacologic 
inhibition of gene function can result in extremely similar 
changes in gene expression. We also demonstrated that one can 
confirm a potential drug target by treating a deletion mutant 
defective in the gene encoding the puutive target. Drug-medi- 
ated signatures from strains with mutations in pathways or 
processes directly or indirectly affected by the drug bore little or 
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no similarity to the wild-type drug expression profile In con- 
trast, drug-mediated signatures from strains with mutations in 
genes Involved in pathways unrelated to the drugs action 
showed extensive similarity to the wild-type drug signature By 
applying this approach to a drug that affects multiple pathways 
(FK506). we were able to decode a complex signature Into com- 
ponent parts. Including the Identification of an off-target' sie 
nature that was mediated through pathways independent of 
caldneurinortheFprl immunophilln. 

Discussion 

It is well-established that high-throughput biochemical screen- 
ing can IdenUfy potent Inhibitory compounds against a given 
target. The 'decoder* approach described here complements 
this process by evaluating the equally important property of 
spedfidty: the tendency of a compound to inhibit pathways 
other than that of Its intended target. The ability to observe 
such -ofT-target' effects will likely be useful in several ways 
ProfiJmg compounds with known toxicities will allow the de- 
velopment of a database of expression changes associated with 
particular toxicities. RecognlUon of potential toxicities in the 
•off-targef signatures of otherwise promising compounds then 
may allow earlier identifjcatlon of those likely to fail in clinical 
trials. Comparing the extent and peculiarities of off-targef sig- 
natures of promising drug candlates could provide a new way 
to group compounds by their effects on secondary pathways 
even before those effects are understood. This may prove to be 
an alternative, potentially more effective, way to select com- 
pounds for animal and clinical trials. Some drugs are more ef- 
fective against a related protein than against the originally 
intended target. Sildenafil (Viagra™), for example, was initially 
developed as a phosphodiesterase inhibitor to control cardiac 
contractility, but was found to be highly specific for phospho- 
diesterase 5. an isozyme whose inhibition overcomes defects in 
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F.g^ 6 Raponseo^FKSOetigrat^e gene, in arains^ deletions 

of 4 .n at least one experiment are listed and the^expresslon ratteh 
the jnd^ted strain are Shown in the g«« (i«J^ ^ 
swn) color scale. The genes have been divided Into d««,Tof^ 
spending to these expeaed behavion: 'CMWependenf oenes 
r«pondtoFK506 (50 ^g/mO except When ett^^ 
rPRi or both are deleted; 'GCA/^-dependenf genes respond W FK5M 
except When CCN^ H deleted. These gene, stin respond to fI«06 
when calcineurin genes or fflS7 or CPH1 are deleted: that their re- 
r "^'"^'^ caldneurin. Cphl. or Fprl. 'CNA. «vi 
GCAM-mdependenf genes respond to FK506 in all deletion strairs 
tested. A complex t^havior' dass is provided (or tho« ge,^ that did 
not match the model of FK506 response mediated^r«t 
cineurin or Fprl or separately through Gcn4. 

penile erecUon. It is possible that application of the 'de- 
coder to other compounds may show that they too have a 
potent acUvity against a target disUnct from their in- 

tended target. 

^•■^'r""^/," "^'"K effects is dependent on the 

availability of functionally targetless' cells. In yeast, this 
IS being achieved by systemaUcally disrupting each yeast 
gene (Saccharomyces Deletion Consortium: http-//se- 
quence-www.stanford.edu/group/yeast_deletlon pro- 
ject/deletion.html). Efforts are underway to obtain 
expression profiles from each deletion muunt strain 
Determining signatures resulOng from inactlvation of es-" 
sentlal genes presents a unique problem, but it may be 
possible to do so by examining heterozygotes or by using a con- 
troHable promoter to reduce expression of the essential gene 
Although ,t is already feasible to test several compounds in 
dozens of yeast strains, another challenge for the 'decoder' 
strategy will be the efficient selection of the mutants with dele- 
tions in genes most likely to encode the intended drug target 
The signature correlation plots described are one meulc thai 
could be used as part of that selection process, but others need 
to be explored. Applying the decoder' to mammalian cells pre- 
serits additional challenges. It is considerably more difficult to 
isolate functionally targetless' cells. Strategies involving tltrat- 
able promoters, known specific inhibitors, anti-sense RNAs ri- 
bozymes. and methods of targeting specific proteins "for 
degradation are possible and should be tested. Another llmita- 
tion IS that not all cell types express the same set of genes and 
therefore off-targef effects may be different in different cell 
types. In addition, applying the decoder' to human cells will 
also require lechnical improvements that allow expression pro- 
filing from a small number of cells. Even the broader question 
of whether the insensitivity of 'off-targef signatures to the dis- 
ruption of the main target is the exception or the rule can only 
be answered by the accumulation of more data. Barkai and 
Leibler. however, have argued in favor of robustness of biologi- 
cal networks, indicating that drug perturbations ( off-tarRef 
signatures) may be robust even when the system is subjected to 
another perturbation (such as a genetic disruption) (ref. 28) 
Many practical developments will be necessary if the decoder- 
concept is to be broadly applied. 

Expression arrays have been used mainly as an initial screen 
for genes induced in a particular tissue or process of interest by 
focusing on genes with large expression ratios. We have 
found, however, that effort to refine experimental protocols 
and repeat experiments increases the reliability of the data and 
permits new applications. For example, it provides a larger set 
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Table 3 Yeast strains used 



Strain 

YPH4g9 

R563 

R558 

R567 

MCY300 

R132 

R133 

R559 

BY4719 

BY4738 

R491 

BY4728 

BY4729 

fi1226 



Relevant genotype 

Mats ura3'S2ly$2'$01 ade2'101 trphA63 his3-^200 

Mata ura3-$2fyi2'801 ade2'101 trphA63 hi53'A200 Ieu2'41 his3::HlS3 

Msta ura3'52 ly$2-801 a6e2'101 trp1-A63 his3-A200ieu2-A1 fpn::HlS3 

Mata ura3'52tys2'801 ade2'101 trp1-A63 his3'A200 leu2'A 1 cph1::HIS3 

Mata ura3'S2 fys2'801 a<Se2-101 trphA63 ha3-A200 Ieu2'6 1 cna U1::hisG cna2A1::HtS3 

Mata ura3-S2tyi2-801 ade2'101 trp1'A63 his3'A200teu2-/^1 cnalJM::hisGcna2A1::HIS3cph1::karf 

Mata ura3'52iy52'801 ade2'101 trp1-A63 his3'A200leu2-61 cnaU1::NsGcna2/^1::HtS3fpr1::karf 

Mata ura3'S2ly52'801 ade2-Wt trp1'A63 ha3-6200 Ieu2-A1 his3::HlS3 gcn4::L£U2 

Mata trp1'A63ura3-AO 

Mata trp1'A63 ura3-dD 

Mfltfl/aBY4719JfBY4738 

Mata his3-A200 trp1*A63 ura3'A0 

Mata his3-A 200 trp 7 -ASS ura3*A0 

Mata/a BY4728XBY4729 



Reference 
(34) 

(this study) 
(this study) 
(this study) 
(21) 

(this study) 
(this study) 
(this study) 
(35) 
(35) 

(this study) 

(35) 

(35) 

(this study) 



of genes at higher confidence levels that serve as a more 
unique signature for a given protein perturbation. In addition. 
£ it allows subtle signatures to be detected, when, for example, a 
8 protein is only partially inhibited. This may enable clinical 
I monitoring of small changes in protein function in disease or 
1 toxicity states before they could otherwise be detected, 
g Because the functions of many genes detected on transcript ar- 
5 rays are known, these microarrays are powerful tools that pro- 
I vide detailed information about a cell's physiology. For 
^ example, changes in the flux through a metabolic pathway are 
g reflected in transcriptional changes in genes in the pathway'. 

Furthermore, it may be possible to indirectly measure protein 
c activity levels from expression profiling data (S.F.. et al., un- 
S published data). Thus, although the eventual development of 
I genomic methods allowing the direct measurement of all cel- 
< lular protein levels will be an important achievement, tran- 

1 script array technology offers an immediate and robust means 

2 of evaluating the effects of various treatments on gene expres- 
0 sion and protein function. 

H Methods 

Construction, growth and drug treatment of yeast strains. The strains 
used in this study (Table 3) were constructed by standard techniques'*. 
To construct strain R559, strain R563 was transformed to Leu' with plas- 
mid pM12 digested by Sa/I and MliA (provided by A. Htnnebusch and T. 
Oever). Strains R132 and R133 were constructed by transforming the bac- 
terial kanamycin resistance cassette^ flanked by genomic DNA from the 
CPhn and FPR1 loci, respectively, and selecting for G4l8-resisiant 
colonies. For experiments with FK506. cells were grovtm for three genera- 
tions 10 a density ofl x 10' cells/ml in YARD medium (YPD plus 0.004% 
adenine) supplemented with 10 mM calcium chloride as described'V 
Where indicated. FK506 was added to a final concentration of 1 \ig/m\ 
0.5 h after inoculation of the culture or to 50 Mg/ml 1 h before cells were 
collected. CsA was used at a Tinal concentration of 50 pg/mt. Cells were 
broken by standard procedures'' with the following modifications: Cell 
pelleu were resuspended in breaking buffer (0.2 M Tris HCI pH 7.6, 0.5 M 
NaCI. 10 mM EDTA, 1% SDS), vonexed for 2 min on a VWR mulii-iube 
vortexer at setting 8 in the presence of 60% glass beads (425-600 pm 
mesh; Sigma) and phenol.'Chloroform (50:50, volume/volume). After sep- 
aration of the phases, the aqueous phase was re-extracted and ethanol- 
precipitated. Poty A' RNA was isolated by two sequential 
chromatographic purtncations over oligo dT cellulose (New England 
Biolabs, Beverly. Massachusetts) using established protocols". 

For experiments using 3-AT. wild-lype or his3/hi53 cells were grown to 
early logarithmic phase in SC medium, pelleted and resuspended in SC 
medium lacking hlstidine for 1 hr in the presence or absence of 10 mM 3- 



AT, as Indicated. Cells were harvested and mRNA isolated as above. 
FK506 was obtained from the Swedish Hospital Pharmacy (Seattle. 
Washington) and purified to homoger>eity by ethyl acetate extraction by 
J. Simon (Fred Hutchinson Cancer Research Center. Seattle, Washington). 
CsA was obtained from Alexis Biochemicals (San Diego. California); 3-AT 
was from Sigma. 

Preparation and hybridization of the labeled sample. Fluorescently-la* 
beled cDNA was prepared, purified and hybridized essentially as de- 
scribed'. Cy3- or Cy5-dliTP (Amersham) was incorporated into cDNA 
during reverse transcription (Superscript II; Life Technologies) and puri- 
fied by concentrating to less than 10 jil using Miaocon-SO microconcen- 
trators (Amicon, Houston. Texas). Paired cDNAs were resuspended in 
20-26 Ml hybridization solution (3 x SSC, 0.75 \ig/m\ polyA DNA, 0.2% 
SDS) and applied to the microarray under a 22- x 30-mm coverslip for 6 
h at 63 'C, all according lo a published method'. 

Fabrication and scanning of microarrays. PCR products containing 
common 5' and 3' sequences (Research Genetics. Huntsville, Alabama) 
were used as templates with amino-modified forward prinr>er and unmod- 
ified reverse primers to PCR amplify 6,065 ORFs from the 5. cerevisiae 
genome. Our first-pass success rate was 94%. Amplification reactions that 
gave products of unexpected sizes were excluded from subsequent analy- 
sis. ORFs that could not be amplified from purchased templates were am- 
plified from genomic DNA. DNA samples from lOO-pl reactions were 
isopropanol-precipitated, resuspended in water, brought to a final con- 
centration of 3x SSC in a total volume of 15 mI. and transferred to 384- 
wetl microtiier plates (Geneiix Limited. Christchurch, Dorset. England). 
PCR products were spotted onto 1 x S-inch polylysine-Ueated glass slides 
by a robot built essentially according to defined specifications'-^' 
(http://cmgm.sianford.edu/pbrown/MGuide). After being printed, slides 
were processed according to published protocols'. 

Microarrays were imaged on a prototype multi-frame CCD camera in 
development at Applied Precision (Issaquah. Washington). Each CCD 
image frame was approximately 2-mm square. Exposure times of 2 s in 
the Cy5 channel (white light through Chroma 618-648 nm excitation fil. 
ler. Chroma 657-727 nm emission filter) and 1 s in the Cy3 channel 
(Chroma 535-560 nm excitation filter, Chroma 570-620 nm emission fil- 
ler) were done consecutively in each frame before moving to the next, 
spatially contiguous frame. Color isolation between the Cy3 and Cy5 
channels was about 100:1 or better. Frames were 'knitted' together in 
software to make the complete images. The intensity of spots (about 100 
nm) were quantified from the lO-jim pixels by frame-by-frame back, 
ground subtraction and intensity averaging in each channel. Dynamic 
range of the resulting spot intensities was typically a ratio of 1.000 be- 
tween the brightest spots and the background-subtracted additive error 
level. Normalization between the channels was accomplished by normal- 
izing each channel to the mean intensities of all genes. This procedure is 
nearly equivalent to normalization betv^n channels using the intensity 
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ratio of genomic DNA spots', but is possibly more robust, as it is based on 
the intensities of several thousand spots distributed over the array. 

Signature correlation coefficienu and their confidence limits. 
Correlation coefTtdents between the signature ORFs of various experi- 
ments were calculated using: 

p.rauy./aVly/r" 
Ic k k 

where x, is the log„ of the expression ratio for the k"* gene in the x signa- 
ture, and y, is the tog« of the expression ratio for the k"^ gene in the y sig- 
nature. The summation is over those genes that were either up- or 
down-regulated in either experiment at the 95% confidence level. These 
genes each had a less than 5% chance of being actually unregulated (hav. 
ing expression ratios departing from unity due to measurement errors 
alone). This confidence level was assigned based on an error model which 
assigns a lognormal probability distribution to each gene's expression 
ratio with characteristic width based on the observed scatter in its re- 
peated measurements (repeated arrays at the same nominal experimental 
conditions) and on the Individual array hybridization quality. This latter 
dependence was derived from control experiments in which both Cy3 
and Cy5 samples were derived from the same RNA sample. For large 
numbers of repeated measurements the error reduces to the observed 
scatter. For a single measurement the error is based on the array quality 
and the spot intensity. ^ 
Random measurement errors in the x and y signatures tend to bias the 
correlation towards zero. In most experiments, most genes are not signif. 
icently affected but do show small random measurement errors. Selecting 
only the -95% confidence' genes for the correlation calculation rather 
than the entire genome, reduces this bias and makes the actual biologica! 
correlations more apparent. 

Con-elations between a profile and itself are unity by definition Error 
limits on the correlation are 95% confidence limits based on the individ- 
ual measurement error bars, and assuming uncorrelated errors". They do 
not include the bias mentioned above; thus, a departure of p from unity 
does not necessarily mean that the underlying biological correlation is im- 
perfect. However, a correlation of 0.7 ± 0.1. for example, is very signifi. 
canily different from zero. Small (magnitude of p < 0.2) but formally 
significant correlation in the tables and text probably are due to small sys 
tematic biases in the Cy5/Cy3 ratios that violate the assumption of inde- 
pendent measurement errors used to generate the 95% confidence 
limiu. Therefore, these small correlation values should be treated as not 
significant. A likely source of uncorrected systematic bias is the partially 
corrected scanner detector nonlinearity that differently affects the Cv3 
and Cy5 detection channels. 

The 1 pg/ml FK506 treatment signature was compared with more 
than 40 unrelated deletion mutant strain or drug signatures. These con- 
trol profiles had correlation coefficients with the FK506 profile that were 
distributed around zero (mean p - ^.03) with a standard deviation of 
0.16 (data not shown), and none had correlations greater than p - 0 38 
Similarly, the calcineurin mutant strain signature correlated well with the 
CsA treatment signature (p . 0.71 ± 0.04) but not with the signatures 
from the negative controls (mean p . -O.02 with a standard deviation of 
0. 1 8). 



smaller spots have fewer image pixels in the average. This does not de. 
grade accuracy noticeably until the number of pixeb falls below ten m 
Which case the spot is rejected from the data set. 'Wander' of spotW 
lions With respect to the nominal grid is adaptively tracked in array 
regions by the image processing software. Unequal spot 'wander' within 
a subregion greater than half-a-spot spacing is a difTicutty for the auto- 
mated quanmating algorithms; in this case, the spot is rejected from 
analysis based on human inspection of the 'wander'. Any spots partially 
overlapping are excluded from the data set. Leu than 1% of spots tvoi- 
cally are rejected for these reasons. 
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Quality controls. End.to-end checks on expression ratio measurement 
accuracy were provided by analyzing the variance in repeated hybridiza- 
tions using the same mRNA labeled with both Cy3 and Cy5. and also 
using Cy3 and Cy5 mRNA samples isolated from independent cultures of 
the same nominal strain and conditions. Biases undetected with this pro- 
cedure. such as gene-specific biases presumably due to differential incor- 
poration of Cy3. and CyS-dUTP into cDf^A. were minimized by doing 
hybridizations in fluor-reversed pairs, in which the Cy3/Cy5 labeling of 
the biological conditions was reversed in one experiment with respect to 
the other. The expression ratio for each gene is then the ratio of ratios be- 
tween the two experiments in the pair. Other biases are removed by algo- 
rithmic numerical de-trending. The magnitude of these biases in the 
absence of de-trending and fluor reversal is typically about 30% in the 
ratio, but may be as high as twofold for some ORFs. 
Expression ratios are based on mean intensities over each spot. Some 
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formed under the following conditions: 300 CA- 
NC 1 M Naa and 50 mM tris-HG (pH ao) at 37*C 
for GO min. In the absence of exogcrtous RNA, neither 
cones nor cylinders formed at corKentrations of 0.5 
M NaQ or bdow. Absorption spectra demonstrated 
that our CA-^C preparatioru were not contaminated 
with (sehgfkhU co// RNA (estimated lower detection 
limit was ^ 1 base/protein molecule). To control for 
even lower Icvcb of RNA contaminatioa we prein- 
cubated the CA-NC protein %vfth 0.5 mg/ml ribonu- 
dease A (Type 1-AS, 54 Kunitt U/mg, Sigma) for 1 
hour at 4*C which then formed cones normally. 
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The Transcriptional Program in 
the Response of Human 
Fibroblasts to Serum 

Vishwanath R, Iyer, Michael B. Eisen. Douglas T. Ross, 
Greg Schuler, Troy Moore. Jeffrey C. F. Lee, Jeffrey M. Trent, 

Louis M. Staudt. James Hudson Jr., Mark S. Boguski, 
Deval Lashkari, Dari Shalon, David Botstein, Patrick O. Brown* 

The temporal program of gene expression during a model physiological re- 
sponse of human cells, the response of fibroblasts to serum, was explored with 
a complementary DNA microarray representing about 8600 different human 
genes. Genes could be clustered into groups on the basis of their temporal 
pattems of expression in this program. Many features of the transcriptional 
program appeared to be related to the physiology of wound repair, suggesting 
that fibroblasts play a larger and richer role in this complex Multicellular 
response than had previously been appreciated. 



The response of mammalian fibroblasts to 
serum has been used as a model for studying 
growth control and cell cycle progression (/). 
Normal human fibroblasts require growth 
factors for proliferation in culture; these 
growth factors arc usually provided by fetal 
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bovine senim (FBS). In the absence of 
growth factors, fibroblasts enter a nondivid- 
ing state, termed Go, characterized by low 



metabolic activity. Addition of FBS or puri- 
fied growth faaors induces proliferation of 
the fibroblasts; the changes in gene exprcs- 
sion that accompany this proliferative re- 
sponse have been the subject of many studies* 
and the responses of dozens of genes to se- 
nim have been characterized. 

We took a fresh look at the response of 
human fibroblasts to scnmi» using cDNA mi- 
croarrays representing about 8600 distinct hu- 
man genes to observe the temporal program of 
transcription that underlies this response. Pri- 
maj>' cultured fibroblasts from human neonatal 
foreskin were induced to enter a quiescent state 
by serum deprivation for 48 hows and then 
stimulated by addition of medium containing 
10% FBS (2). DNA microanay hybridization 
was used to measure the temporal changes in 
mRNA levels of 8613 human genes (5) at 12 
limes, ranging from 15 min to 24 hours after 
serum stimulation. The cDNA made from pu- 
rified mRNA from each sample was labeled 
with the fluorescent dye Cy5 and mixed with a 
common reference probe consisting of cDNA 
made from purified mRNA from the quiescent 



Rg. 1. The same section of 
the microarray is shown 
for three irnJeperKlent hy- 
bridizations comparing RNA 
isolated at the 8-hour time 
point after serum treat- 
ment to RNA from serum- 
deprived cells. Each mi- 
aoarray contair>ed 9996 
elements, including 9804 
human cDNAs. represent- 
irig 8613 different genes. 
mRNA from serum-de- 
prived cells was used to 
prepare cDNA labeled with 

Cy3-deoxyuridine Uiphosphate (dUTP). and mRNA harvested from ceUs at different times after senim 
Sfrl" ^''P^'! '^^^ Cy5-dUTP. The two cDNA probes were mixed and 

sirnultaneously hybnd.zed to the miaoarray. The image of the subsequent scan shov« genes whose 
mRNAs are more abundant in the senjm-deprived fibroblasts (that is. suppressed by seojm treatment) 
as green spots and genes whose mRNAs are more abundant in the serum-treated fibroblasts as red 
spots YeUow spots represent genes whose expression does not vary subnamially between the two 
/-^""cfrJ??*'^ 'P°^ represeming the following genes: 1, protein disulfide isomerase- 
related protein PS: 2. IL-B precursor; 3. EST AA057170; and A. vascular endothelial growth factor 
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culture (time zero) bbeled with a second fhx>- 
resccnt dye Cy3 (fl. The color images of the 
hybridization results (Fig. 1) were made by 
re pre s e n t i ng the Cy3 fluorescent image as 
green and the Cy5 fluorescent image as red and 
merging the two color images. 

Diverse temporal profiles of gene expres- 
sion could be seen among the 8613 genes sur- 



Fig, 2. Ouster Im^ 
showing the different 
dasses of gene expres* 
sion profiles. Five hun- 
dred seventeen genes 
whose mRNA levels 
changed in response to 
senjm sttmulaticn were 
seteaed (7). This sub- 
set of genes was clus- 
tered hierarchicatly into 
groups on the basis of 
the similarity of their 
expression profiles by 
the procedure of Eisen 
er a/. (6). The expres- 
sion pattern of each 
gene in this set is dis- 
played here as a hori- 
xontat strip. For each 
gene, the ratio of 
mRNA levels in fibro- 
blasts at the indicat- 
ed time after senim 
stimulation ("unsync" 
denotes exponentially 
growing cells) to its 
level in the semnvde- 
prived (time zero) fi- 
broblasts is represented 
. by a color, according to 
the color scale at the 
bottom. The graphs 
show the average ex- 
pression profiles for the 
ger»es in the corre- 
sponding •'duster" (in- 
dicated by the letters A 
to J and color coding). 
In every case examined, 
when a gene was rep- 
resemed by more than 
one array element, the 
multiple representa* 
tions in this set were 
seen to have identical 
or very similar expres- 
sion profiles, and the 
profiles corresponding 
to these independem 
measurements dus* 
tered either adjacem 
or very dose to eadi 
other, pointir^ to the 
robustness of the dus- 
tering algorithm in 
grouping genes with 
very similar patterns of 
expression 
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veycd in this experiment (Fig. 2); many of these 
genes (about half) were unnamed expressed 
sequence tags (ESTs) (i). Although diverse 
patterns of expression were observed, the order- 
ly choreography of the expression program be- 
came appaiCTt vjfien the results were anal>'2ed 
by a clustering and display method dc\'clopcd 
in our laboratory for analj-zing genome-wide 



I SO; ClusterA (100 genes) 
1:1 




1:1 



Otv 1 hf 6 hr 16 hr Unsync 
Time 

Fold induction 
>2 >4 >6 >B 



gCTie expression data (6). An example of such 
an analysis, here applied to a subset of 517 
genes whose exprcssicm changed substantially 
in response to serum (7), is sh wn in Fig, 2. 
The entire detailed data set underlying Fig. 
2 is available as a tab-delimited table (in 
cluster order) at the Science Web site (www. 
scicncemag.0Tg,'feanjre/daia/984559.shl). In 
addition, the entire, larger^ data set for the 
complete set of genes analyzed in this exper- 
iment can be found at a Web site maintained 
by our laboratory (genome-www.stanford. 
cdu/serum) (8). 

One measure of the reliability of the 
changes we observed is inherent in the ex- 
pression profiles of the genes. For most genes 
whose expression levels changed, we could 
sec a gradual change over a few time points, 
which thus cffcaively provided independent 
measurements for almost all of the observa- 
tions. An additional check was provided by 
the inclusion of duplicate and, in a few cases, 
multiple array elements representing the 
same gene for about 5% of the genes included 
in this microarray. In addition, three indepen- 
dent hybridizations to different microarrays 
with mRNA samples from cells harvested 8 
hours after serum addition showed good cor- 
relation (Fig. I). As an independent test, we 
measured the expression levels' of several 
genes using the TaqMan 5' nuclease fluori- 
genic quantitative polymerase chain rcaaion 
(PCR) assay (P). The expression profiles of 
the genes, as measured by these two indepen- 
dent methods, were very similar (Fig, 3) {JO). 

The n-anscripiional response of fibroblasts 
to serum was cxnemcly rapid. The immediate 
response lo scrum stimulaiioii was dominated 
by genes that encode transcription factors 
and oiher proteins involved in signal trans- 
duction. The mRNAs for several genes (in- 
cluding c-FOS, JUN B, and miiogen-acti- 
vatcd protein (MAP) kinase phosphatasc-I 
(MKP!)] were dctectably induced within 
15 min after scrum stimulation (Fig. 4, A 
and B). Fifteen of the genes that were 
observed to be induced by serum encode 
known or suspected regulators of transcrip- 
tion (Fig. 4B). All but one were immcdiate- 
eariy genes — their induction was not inhib- 
ited by cyciohcximidc [J J). This class of 
genes could be distinguished into those 
whose induction was transient (Fig. 2, clus- 
ter E) and those whose mRNA levels re- 
mained induced for much longer (Fig. 2, 
clusters 1 and J). Some features of the 
immediate response appeared to be directed 
at adaptation to the initiating signals. We 
observed a marked induction of mRNA 
encoding MKPl. a dual-specificity phos- 
phatase that modulates the activity of the 
ERKl and ERK2 MAP kinases (12). The 
coincidence of the peak of expression of 
genes in cluster E (Fig. 2) with that of 
MKPI (Fig. 4A) suggests the possibility 
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that continued activity of the MAP kinase path- 
way is required to maintain induction of these 
genes but not of those with sustained expression 
(clusters ! and J). The gene encoding a second 
member of the duaJ-specificity MAP kinase 
phosphatase family, known as dual-specificity 
protein phosphatase 6/pyst2, was induced later, 
at about 4 hours after scrum stimulation. Genes 
encoding diverse other proteins with roles in 
signal transduction, ranging from cell-surface 
receptors [for example, the sphingosine 1- 
phosphate receptor (EDG-I), the vascular en- 
dothelial growth feaor receptor, and the type II 
BMP receptor] to regulators of G-protein sig- 
naling (for example, NFri/pll5 rho GEF) to 
DNA-binding transcription faaors, were in- 
duced by scrum (Fig. 4A). 

The reprogramming of the regulatory cir- 
cuits in response to scrum involved not only 
induction of transcription facton but also re- 
duced expression of many transcriptional reg- 
ulators — some of which may play roles in 
maintaining the cells in Go or in priming 
them to react to wounding (Fig. 4C). Perhaps 
as a consequence of the historical focus on 
genes induced by scrum stimulation of fibro- 
blasts, the set of n^nscription factors whose 
expression diminished upon serum stimula- 
tion has been less well characterized. 

Genes known or likely to be involved in 
controlling and mediating the proliferative re- 
sponse showed distinctive panems of regula- 
tion. Several genes whose products inhibit pro- 
gression of the cell-division cycle, such as p27 
Kipl, p57 Kip2, and pi 8, were expressed in the 
quiescent fibroblasts and down-regulated be- 
fore the onset of cell division. The nadir in the 
mRNA levels for these genes occurred between 
6 and 12 hours after serum stimulation (Fig, 
5A), coincident with the passage of the fibro- 
blasts through G,. The levels of the transcript 
encoding the WEE I -like protein kinase, which 
is believed to inhibit mitosis by phosphoryl- 
ation of Cdc2, diminished between 4 and 8 to 
12 hours after serum addition (Fig. 5 A), well 



before the onset of M phase at around 16 hours, 
raising the possibility of an additional role for 
Wee I in an eariier stage of the cell cycle or in 
regulating the Gq to G, nansition! Several 
genes induced in the fir^ few hours after scrum 
stimulation, such as the helix-loop-helix pro- 
teins ID2 and ID3 and EST AA0I6305. a gene 
>^ith homology to G,-S cyclins, are candidates 
for roles in promoting the cxii from G^. 

Genes involved in mediating progression 
through the cell cycle were characterized by a 
distinctive panem of expression (Fig. 2, clus- 
ter D), reflecting the coincidence of their 
expression with the reentry of the stimulated 
fibroblasts into the cell-division cycle. The 
stimulated fibroblasts replicated their DNA 
about 16 hours after serum n^atment. This 
timing was reflected by the induction of 
mRNA encoding both subunits of ribonucle- 
otide reductase and PCNA, the processivity 
factor for DNA polymerase epsilon and delta. 
Cyclin A, Cyclin BI, Cdc2. and CDC28 ki- 
nase, regulators of passage through the S 
phase and the transition frorh Gj to M phase, 
were induced at about 16 to 20 hours after 
scrum addition. The kinase in the Cyclin 
Bl-CDK pair needs to be activated by phos- 
phorylation. The gene encoding Cyclin-de- 
pendent kinase 7 (CDK7; a homolog of A'^- 
nopus MO 15 cdk-activating kinase) was in- 
duced in parallel with the Cdc2 and Cdc28 
kinases (Fig. 5A), suggesting a potential role 
for CDK7 in mediating M phase. DNA topo- 
isomerase II a, required for chromosome seg- 
regation at mitosis; Mad2, a component of 
the spindle checkpoint that prevents complc- 
tion of mitosis (anaphase) if chromosomes 
are not anached to the spindle; and the kinet- 
ochore protein CENP-F all showed a similar 
expression profile. 

In the hours after the scrum stimulus, one of 
the most striking feamres of the unfolding tran- 
scriptional program was the appearance of nu- 
merous genes with known roles in processes 
relevant to the physiology of wound healing. 
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These included both genes involved in the di- 
rect role played by fibroblasts in remodeiing of 
the clot and the cxtracclhilar matrix and. more 
notably, genes encoding proteins invoKed in 
mtcrcellular signalmg (Fig. 5). Genes induced 
in this program encode proctets that can (i) 
participate in the dynamic process of clotting, 
clot dissolution, and remodeling and pertiaps 
contribute to hemostasis by fronoting local 
v^onstriction (for example, cndothelin-l); 
(li) promote chemotaxis and activation of neu- 
trophils (for example, C0X2) and recruitment 
and extravasation of monocytes and macro- 
phages (for example, MCPl); (iii) promote 
chemotaxis and aaivation of T lymphocytes 
(for example, intcrleukin-8 (IL-8)] and B 
lymphocytes (for example, ICAM-I), thus 
providing both innate and antigen-specific 
defenses against wound infeaion and recruit- 
mg the phagocytic cells that will be required 
to clear out the debris during remodeling of 
the wound; (iv) promote angiogenesis and 
neovascularization (for example, VEGF) 
through newly forming tissue; (v) promote 
migration and proliferation of fibroblasts (for 
example. CTGF) and their differentiation into 
myofibroblasts (for example, Vimentin); and 
(vi) promote migration and proliferation of 
keraiinocytes, leading to recpithelialization 
of the wound (for example, FGF7). and pro- 
mote proliferation of melanocyies, perhaps 
contributing to wound hyperpigmentation 
(for example, FGF2). 

Coordinated regulation of groups of genes 
whose products act at different steps in a 
common process was a recurring theme. For 
example, Furin, a prohonnone-processing 
protease required for one of the processing 
steps in the generation of active endothelin, 
was induced in parallel with induction of the 
gene encoding the precursor of endothelin- 1 
(Fig. 5E) (J 3). Conversely, expression of 
CALLA/CDIO. a membrane mctalloprotcase 
that degrades endothelin- 1 and other peptide 
mediators of acute inflammation, was re- 
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Fig. 3. Independent verification of miaoarray quantrtation. Relative mRNA 
levels of the indicated genes (Mast, mast/stem cell growth factor receptor) 

"^'^ 'JJ^^"^!?^.'^? "^^^^^^ ^' """^^^^^ fluorigenic quantitative PGR 
assay (9) (left) in the same samples that were used to prepare probes for 
miaoan-ay hybridizations (right). Data from the Taqf^n analysis were 
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normalized to mRNA concentrations and ploned relative to the level at 
t.me zero, so that the resulu could be compared with those from the 
m.croarray hybridizations In general quantitation with the two methods 
gave very similar results (70). 
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duced. A second example is provided by a set 
of five genes involved in the biosynthesis of 
cholesterol (Fig. 51). The mRNAs encoding 
each of these enzymes showed sharply dimin- 
ished expression beginning 4 to 6 hours after 
serum stimulation of fibroblasts. A likely ex- 
planation for the coordinated down-rrgula- 
tion of the cholesterol bi ©synthetic pathway 
is that serum provides cholesterol to fibro- 
blasts through low-density lipoproteins, 
whereas in the absence of the cholesterol 
provided by serum, endogenous cholesterol 
biosynthesis in fibroblasts is required. 

Many of the previously studied genes that 
we observed to be regulated in this program 
have no recognized role in any aspea of wound 
healing or fibroblast proliferation. Their identi- 
fication in this smdy may therefore point to 
previously tmknown aspects of these processes. 
A few selected genes in this group arc showTi in 
Fig. 5H. The stanniocalcin gene, for example 
(Fig. 5H). encodes a secreted protein without a 
clearly identified fiinction in human cells {J 4, 
15), Its induaion in serum-stimulated fibro- 



Reports 

blasts suggests the possibility that it may play a 
role in the wound-healing process, perhaps 
serving as a signal in mediating inflammation 
or angiogenesis. 

One of the most important results of this 
exploration was the discovery of over 200 pre- 
viously unknown genes whose expression was 
regulated in specific temporal panems during 
the response of fibroblasts to scrum. For exam- 
ple. 13 of the 40 genes in cluster D (Fig. 1) have 
descriptive names that reflect their putative 
function. Nine of these 13 genes (69%) encode 
proteins that play roles in cell cvcle progres- 
sion, paniculariy in DNA replication and the 
G^-M transition. This enrichment for cell 
cycle-related genes suggests that some of the 



unnamed genes in this cluster— fw example 
EST W793II and EST R13I46. neither of 
which have sequence similarity to previously 
characterized genes— may represent prviously 
unknown genes involved in Ais pan of the cell 
cycle. Similarly, a remarkable fraction of genes 
that were grouped into chister F on the basts of 
their expression profiles encoded proteins in- 
volved in intercellular signaling (Fig. 2). sug- 
gesting that a simiUr role should be considered 
for the many unnamed genes in this cluster. A 
disproponionaicly iar^e fiaaion of the gcites 
whose transcriprion diminished upon scrum 
stimulation were unnamed ESTs. 

Our intention was to use this experiment as 
a model to study the control of the transition 




Fig. 4. "Reprogramming" of fibroblasts. Expres- 
sion profiles of genes whose funrtion is likely to 
play a role in the reprogramming phase of the 
response are shown with the same representa- 
tion as in Fig. 2. In the cases in which a gene 
was represented by more than one element In 
the microarray. all measurements are shown. 
The genes were grouped into categories on the 
basis of our knowledge of their most likely role. 
Sortie genes with pleiotropic roles were includ- 
ed in more than one category. 
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from Go to a proliferating state. However, one 

of the defiiiing charaaeristics of geiiomc-scalc 
expression profiling experiments is that the ex- 
amination of so many diverse genes opens a 
window on all the processes that actually occur 
and not merely the single process one intended 
to observe. Scnim, the soluble fraction of clot- 
ted blood, is normally encountered by cells in 
vivo in the context of a wound indeed, the 
expression program that we observed in re- 
sponse to senim suggests that fibroblasts arc 
programmed to interpret the abrupt exposure to 
scrum not as a general mitogenic stimulus but 
as a specific physiological signal, signifying a 
wound The proliferative response that we orig- 
inally intended to study appeared to be part of a 
larger physiological response of fibroblasts to a 
wound Other features of the transcriptional 
response to scmm suggest that the fibroblast is 
an acrivc participant in a conversation among 
the diverse cells that work together in wound 
repair, interpreting, amplifying, modifying, and 
broadcasting signals controlling infiammaiion. 
angiogcnesis, and epithelial regrowth during 
the response to an injury. 

We recognize that these in vitro results 
almost certainly represent a distorted and in- 
complete rendering of the normal physiolog- 
ical response of a fibroblast to a wound 
Moreover, only the responses elicited directly 
by exposure of fibroblasts to scrum were 
examined. The subsequent signals from other 
cellular participants in the nomial wound- 
heahng process would certainly provoke ftjr- 
iher evolution of the transcriptional program 
in fibroblasts at the site of a wound, which 
this experiment cannot reveal. Nevertheless 
we believe that the picnire that emerged 
strongly suggests a much larger and richer 
role for the fibroblast in the orchestration of 
this important physiological process than had 
previously been suspected 
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Systematic variation in gene expression 
patterns in human cancer cell lines 



Douglas T. Ross^ Uwe ScherfS. Michael B. Eisen^. Charles M. Perou^, Christian Rees^ Paul Spell: 

ri^Tp"^/'? n ' Matt Van de Rijn^ Mark Waltham^. Alexander Pergamenschikov^. 

il^^^r^f' 7? " ■ ■ "^y"'"' ^^'^^ N. Weinstein^ David Botstein^ 

& Patrick O. Brown 

W used CDNA --oa^^^^^^ explore the variation in expression of approximately 8,000 unique genes among the 
60 c II lines used ,n the National Cancer Institute's screen for anti-cancer drugs. Classification of the cell lines based 
solely on the observed patterns of gene expression revealed a correspondence to the ostensible origins of the 
tumours from which the cell lines were derived. The consistent relationship between the gene expression patterns 
and the t.ssue of ongm allowed us to recognize outliers whose previous classification appeared incorrect. Specific 
?k7 w '° physiological properties of the cell lines.Vuch 

as their doubhng t.me m culture, drug metabolism or the interferon response. Comparison of gene expression pat. 
t rns .n the cell lines to those observed in normal breast tissue or in breast tumour specimens revealed features of 
th xpression patterns in the tumours that had recognizable counterparts in specific cell lines, reflecting the 
tumour, stromal and inflammatory componenu of the tumour tissue. These resulu provided a novel molecular 
characterization of this important group of human cell lines and their relationships to tumours in vivo 



Intr duction 

Cell lines derived from human tumours have been extensively used 
as experimental models of neoplastic disease. Although such ceU 
lines differ from both normal and cancerous tissue, the inaccessi- 
bility of human tumours and normal tissue makes it likely that 
such cell lines will continue to be used as experimental models for 
the foreseeable ftiture. The National Cancer Institute's Develop- 
mental Therapeutics Program (DTP) has carried out intensive 
studies of 60 cancer cell lines (the Na60) derived from tumours 
from a variety of tissues and organs'-*. The DTP has assessed many 
molecular features of the cells related to cancer and chemothcra- 
peutic sensitivity, and has measured the sensitivities of these 60 cell 
lines to more than 70,000 different chemical compounds, includ- 
ing all common chemotherapeutics (http://dtpjici.nih.gov). A 
previous analysis of these data revealed a connection between the 
panern of activity of a drug and its method of action. In particular, 
there was a tendency for groups of drugs with similar patterns of 
aaivity to have related methods of artion'-^^ 

We used DNA microarrays to survey the variation in abun- 
dance of approximately 8,000 distinct human U-anscripts in these 
60 cell lines. Because of the logical connection between the func- 
tion of a gene and its pattern of expression, the correlation of gene 
expression patterns with the variation in the phenotype of the ceD 
can begin the process by which the function of a gene can be 
inferred. Similarly, the patterns of expression of known genes can 



reveal novel phenotypic aspects of the cells and tissues studied*"***. 
Here we present an analysis of the obseived panerns of gene 
expression and their relationship to phenot>'pic properties of the 
60 ceU lines. The accompanying report » » explores the relationship 
benveen the gene expression panerns and the dnig sensitivity pro- 
fUes measured by the DTP. The assessment of gene expression pat- 
terns in a multitude of ceU and tissue types, such as the diverse set 
of cell lines we studied here, under diverse conditions in vitro and 
m v;vD. should lead to increasingly detailed maps of the human 
gene expression program and provide clues as to the physiological 
roles of uncharacterized genes" '-'6. The databases, plus tools for 
analysis and visualization of the data, are available (http://genome- 
www.sianford.edu/nci60 and http://discover.nci.nih.gov). 

Results 

We studied gene expression in the 60 cell lines using DNA 
microarrays prepared by robotically sponing 9.703 human 
cDNAs on glass microscope slides'^.ie. The cDNAs included 
approximately 8,000 different genes: approximately 3.700 repre- 
sented previously characterized human proteins, an additional 
1,900 had homoiogues in other organisms and the remaining 
2,400 were identified only by ESTs. Due to ambiguity of the iden- 
tity of the cDNA clones used in these studies, we estimated that 
approximately 80% of the genes in these experiments were cor- 
rectly identified. The identities of approximately 3.000 cDNAs 



Pharmaceuticali. Fremont, California, USA. 'GenometrixJnc The WoadIa„d. T.,-. i/c^ *i / ■' """•^""""""■'^'''ryuina, uiA. Incyte 
ProgranuDivisioncfCancerTrLmen,andDi,gnJsNaZ^^^^ 



nature genetics • votim 24 • march 2000 



227 



article 



age 2000 Nature America Inc. • http://genetlcs^ture.com 



o 
u 



£ 
< 
o 

3 



S 

o 
n 

O 





ovarian leukaemia colon 



F(9. 1 Gene expretsion patterns related to the tiuue of origin of the cell 

».anal hierarchical clurterirn, wa, applied to expression d^a f rl^ ^ /rsTtrNT 

measured across 64 cell lines. The 1,161 cDNAs were thos* (of q 7n/,«?^.i • L 

weighted fo, ,h, 9e« clurtering » ,ha, „th o^TlC, m «n """'"""'"S'* 

relrton toeffitiem r<p,„ented by the length oi thi 1^1^?* ^ f*^^ 

tightly together .nd were well differer,ti«ed from .«nX m^. , , . """" 
ir.diaitir^ thn ,ho clustering of nil lrn« i. b«Ld^n7h ^ """^ "'""^ «" 

e.p,e..ton pane,™ r.ther m.n 'n^il^ ^Z «»" 9"<e 

repr«ent.t,on of the d... table, with * 
order. The de.ui,ogr.m repr«.ming " 
ted for clarity, but i..v.il,bl, (http^enoAr,^ «rnfZ L^- ^t^," 
cell of thi. table ,efl«™ the n«In.^ZT,.^'"'Zf:^T^J- 
(column). The colour scale used to reore^m .k!^L ^'"^ «" 

•3^3d-in<«„fe,.o,h.c'js:^:°^^rr.:;X"S"4T ' 




from these experiments Have been sequence-verified, including 
all ofthose referred to here by name. 

Each hybridization compared Cy5-labeUed cDNA reverse tran- 
scribed from mRNA isolated f^om one of the ceD lines with Cy3- 
labelled cDNA reverse transcribed from a reference mRNA 
sample. TTiis reference sample, used in all hybridizations, was 
prepared by combining an equal mixture of mRNA from 12 of 
the cell bnes (chosen to maximize diversity in gene expression as 
determined prmiarily from two-dimensional gel studies^) Bv 
comparing cDNA from each cell line with a common reference 
variation in gene expression across the 60 cell lines could be 
interred from the observed variation in the normalized Cv5/Cv3 
ratios across the hybridizations. 

To assess the contribution of anefactua] sources of variation in 
1^^-''^!^'"''"'' expression patterns. K562 and 

MCF7 cell bnes were each grown in three independent cultures 
and the enure process was carried out independently on mRNA 
extracted from each culture. The variance in the triplicate fluo- 
rescence rauo measurements approached a minimum when the 
fluorescence signal was greater than approximately 0.4% of the 
measurable total signal dynamic range above background in 
either channel of the hybridization. We selected the subset of 
spots for which significant signal was present in both the numer- 
ator and denominator of the ratios by this criterion to identify 
the best-measured spots. The pair-wise correlation coefficients 
for the triplicates of the set of genes that passed this quality con- 
trol level (6.992 spots included for the MCF7 samples and 6 °6l 
spots for K562) ranged from 0.83 to 0.92 (for graphs and details 
see http://genome-www.stanford.edu/nci60). 

To make the orderly features in the data more apparent, we used 
a hierarchical clustering algorithm'"" and a pseudo-colour visu- 



abzation matra'-^'. The object of the clustering was to group ceU 
bnes with similar repertoires of expressed genes and to ™ 
genes whose expression level varied among the 60 cell lina in a 

ubsets of genes to assess the robustness of the analysis. In one case 
(Fig. 1). we concentrated on those genes that showed the most 
variation m expression among the 60 ceU lines (1.167 total) A sec- 
ond analysis (Fig. 2) included all spots that were thought to be well 
measured in the reference set (6.831 spots). 
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or%7nro?7h"'cTll'ir„"e!"' ''''''' *° ^'"^"^^'^ 

The most notable property of the clustered data was that cell lines 
with common presumptive tissues of origin grouped together 
F^s . and 2). Cell lines derived from lelkaLia!^ melanoma 
central ner\-ous system, colon, renal and ovarian tissue were clus- 
ered into independent terminal branches specific to their respec- 
tive organ types with few exceptions. CeU lines derived fJom 
non-smal lung carcinoma and breast tumours were distributed 
in multiple different terminal branches suggesting that their gene 
expression patterns were more heterogeneous 

Many of these coherent cell line clusters were distinguished by 
%l Twf'c '^''P^'"'?" charaaeristic groups of genes 
(Fig 3<»-<f ). For example, a cluster of approximately 90 genes was 
highly expressed in the melanoma-derived lines (Fig. 3f) This set 
was enriched for genes with known roles in melanocyte bioloey 
includmg tyrosinase and dopachrome tautomerase (TYR and 
u »f 8" '"zyne complex involved in melanin 

synthesis-''), MARTI (MLANA; which is being investigated « a 
target lor immunotherapy of melanoma") and SIOO-P (SIOOB- 
which has been used as an antigenic marker in the diagnosis of 
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Hp, 2 Gene expreuion pettems related to 
other celMinc pheriotype*. «. We applied 
two-dtmerafofMl hierarchtcJl clustertng to 
expreuion data from a set of 6.831 cONAs 
measured aaoo the 64 cell lines. The 6,83t 
cONAs were those wtth a minimum fluores- 
cence si9nal intensity of approximately 0.4% 
of the dynamic range above background in 
the reference channel In each of the lix 
hybridizations used to establish reproducibil- 
ity. This effectively seleaed those spots that 
provided the most reliable ratio measure- 
menu and therefore identified a subset of 
genes useful for exploring patterns comprised 
of those whose variation In expression across 
the 60 cell linn was of moderate magnitude. 
b. Ouster-ordered dau table, t. Doubling 
tifT»e of cell lines. Cell lines are given in cluster 
order. Values are plotted relative to the mean. 
Doubling times greater than the mean are 
shown in green, those with doubting time less 
than the mean are shown in red. d. Three 
related gene clusters that were enriched for 
genes whose expression level variation was 
correlated with cell line proliferation rate. 
Each of the three ger>e ciusiers (clustered 
solely on the basis of their expreuion pat- 
term) showed enrichment for sets of genes 
involved in distirtct functional categories <for 
example, ribosomal genes versus genes 
involved in pre-RNA splicing), e, Gene cluster 
in which all charanerized and sequerKe-veri- 
fied cDNAs erKode gertes known to be regu- 
lated by interferons, f. Gene cluster enriched 
for gertes that have been implicated in drug 
metabolism (indicated by asterisks). A further 
property of the gene clustering evident here 
and in Fig. 2 is the strong tervdency for rtiiun- 
dant representations of the same gene to 
cluster immediately adjacent to one another, 
even within larger groups of genes with very 
similar expression patterns. In addition to 
illustrating the reproducibility and consis- 
tency of the measurements, and providir>g 
Independent confirmation of rriany of our 
measurements, this property afso demon- 
strates that these, and probably all. genes 
have nearly unique patterns of variation 
across the 60 cell lines. If this were not the 
case, and multiple genes had identical pat- 
terns of variation, we would not expea to be 
able to distinguish, by clustering on the basis 
of expression variation, duplicate copies of 
individual gerws from the other genes with 
identical expression patterns. 
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beenntedt bck melanin and other nmkm usrfbJ SnS ™™ 5, f 1.159 genes (Fig. 2«) included 

cati n of meIanon« cells'. «««uJ londentili- >f'o* Fodum ait necessary f r progr^ 

Paradoxically, tvvo related ceD lines (MDA-MB435 and MDA- «i™ in ^ ? '^9^^^ MAD2L1). RNApro- 
N) which we« derived from a single patient wid,brSc^?4 Sft^dT"^""" "l'^"' «NA helicSJ 

and have been conventionaDy regarded « breast canco^^^ faaors) and tradiS 

• " 'w^zr^^irJ:::'^^^'^^^^^^ 



Shared expression of the genes associated with meianoma. MDA- 
MB435 was isolated from a pleural effusion in a patient with 
metastatic ductal adenocardnoma of the breast^^'^s^ jt regains 
possible that the origiri of the cell line was a breast canar. and that 
ixs gene expression panem is related to the neuroendocrine fea- 
tures of some breast canccrs^^. But our results suggest that this cell 
line may have originated from a melanoma, raising the possibility 
that the patient had a co-existing occult melanoma. 

The higher-level organization of the ceD-line tree— in Miich 
^oups span cell lines from different tissue types--also refleaed 
shared biological properties of the tissues from u^ich the cell 
lines were denved. The carcinoma-derived cell lines were divided 
into major branches that separated those that expressed eenes 
charaaenstic of epithelial cells from those that expressed genes 
more typical of stromal ceils. A cluster of genes is shown (Fie m 
that IS most strongly expressed in cell lines derived from colon 
carcmomas, six of seven ovarian-derived cell lines and the two 



WiJunthislargedu^erweresmJ^^^^^ 
with more speaahzed roles. One duster v«s highly enridu^ for 
numerous nbosomal genes, whereas another w^ore «ShS 
for genes encodmg RNA-splidng faaors. Tbt variation 
expression of these ribosomaj genes was significantly corrdat^ 
w,th var.3Uon in the cell doubling time (corLati^SeSt^f 
0.54 . supporung the notion that the genes in this duster we« 

In a smaller gene duster (Fig. 2d), all of the named genes were 
previously known to be regulated by interferons »J*.ld<^^3 
groups of imerferon-regulated genes showed distina patiems of 
eET ' ■"°"''°^^''"88«ting that the NQMcelllinS 
exhibited variauon m activity of interferon-response pathway 
which was reflected in gene expression pattemsJ* P*""*^ 
Another duster (Fig. 2c) contained several genes encodins 



breas.can«rhnespositivefortheoestro;„i:;c:;tor^^^^^ f^S^J^ 1--"' ---'""^ - 
genes in this duster have been impLcated in s^eral al^of ^S^t?£'''''"T'^''^'^'^^^^^ 



" ... ,. , , n^, •'••yu^iai m several aspects of 

epithehal cdl biology^'. The duster was enriched for genes whose 
products are known to localize to the basolatera] membrane of 
epithdial cells, induding those encoding components of 
adherens complexes (for example, desmoplakin (DSP) 
periplakin (PPL) and plakoglobin (JUP)). an epithdiaJ- 
expressed cdl-cdl adhesion molecule (M4S1) and a sodium/ 
hydrogen ion exdianger^*"" (S1C9A1). It also contained genes 
that encode puutwe transcriptional regulators of epithdial mor- 
phogenesis, a human homologue of a Drosophik melanogaster 
epithehal-expressed tumour suppressor (LLGLl) and a homeo- 
box gene thought to control caldum-mediated adherence in 
epithelial cdls«J'(MSX2). aonerence m 

In contrast, a separate, major branch of the cell-line dendro- 
gram (Fig. In) induded all glioblasioma-derived cell lines all 
renal-cell-carcinoma^erived cell lines and the remaining ca'rd- 
noma-dernred lines. The diaraaeristic set of genes expressed in 
this duster induded many whose products are involved in stro- 
mal cell functions (Fig. id). Indeed, the two cell lines originally 
described as sarcoma-like' in appearance (Hs578T, breast card 
nosarcoma. and SF539. gliosarcoma) expressed most of these 
genes Ahhough no single gene was umformly diaracteristic 
of this duster, eadi cell line showed a distinctive panern of 
expression of genes encoding proteins widi roles in synthesis or 
modification of the extracdluiar matrix (for example, caldesmon 
w^"'/''^'^'"' •l^'"''«>^°''«lin (THBS). lysyl oxidase 
(UDX) and coUagen subtypes). Although the ovarian and most 
non-smaD-cdl-lung-derived cardnomas expressed genes diarac- 
teristic of both epithelial cells and stromal cells, they probably 
clustered with the CNS and renal cell carcinomas in this analysis 
because genes characteristically expressed in stromal cells were 
more abundantly represented in this gene set. 



doxin frXNl .^A .w T" synthesis), thiore- 

OQXin (TOJ) and thioredoxin reducuse (TXNRDl; enzymes 
mvoKed in regulating redox state in ceUs) and SSpi^^ 
transporter known to efficiently transport glutathione-conh^^ 
gated compounds"). The devated expr^ion of tlT^of^^^^ 



Physiological variation reflect d 
ing ne express! n patterns 

A duster diagram of 6,831 genes (Fig. 2) is useful for exploring 
dusters of genes whose variation in mRNA levds was not obvi! 
ously attributable to ceU or tissue type. We identified some gene 
dusters that were enriched for genes involved in specific cellular 
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Cell lines facilitate interpretation of gene expression 
patterns in complex clinical samples 
Like many other types of cancer, nimours of the breast typically 
have a complex his^logical organization, with connectivSe 
and leukocyuc infiltrates intemoven with tumour cdls To 
explore the possibibty that variation in gene expression in the 
tumour ceD lines might provide a framework forinterpr^g Se 
expression patterns m tumour specimens, we compared RNA 
isolated from two breast cancer biopsy samples, a samplTof n^^ 
mal breast tissue and the NC160 cell lines derived fi^m br^st 
T^ri TK MDA-MB-435 and MDA-N) and leukaer^i 

Fig. 4). This dustering highlighted features of the gene expres- 

ceU lines derived from breast cancers and leukaemias 
The genes encoding keratin 8 (KRT8) and keratin 19 (KRT19) 

^ A K """" ^^'^^ °f *e biopsy 

samples and the two breast-derived cell lines, MCF-7 and T47D 

In^^r"^ *"88«""g that these tran- 

script °"g>nated in tumour cdls with features similar to those of 
luminal epithelial ceUs (Fig. 5a). Expression of a set of genes char- 

Cn,T, ''^TTJ:^ '"^'"'^'"8 ^"""8" genes (COL3A , 
Srm, ' OOt6Ai) and smooth musde ceU marker 
TAGLN , was a feature shared by the tumour sample and the 
stromal-like ceU lines Hs578T and FT549 (Fig. 5b). This fea.Lrl 
of the expression panern seen in the tumour samples is likely to 
be due .0 the stroma] component of the tumour. TTie tumours 
a^so shared expression of a set of genes (Fig. 5c) with the muluple 
myeloma cell hne (RPMI.8226), notably indudine 
immunoglobulin genes, consistent with the presence of B cells 
in the tumour (this was confirmed by staining with anti- 
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FI9. 3 Gene ciutlen teUled to tiitue chsraaeristks in th« r.ii r . 
fo, sene. «p,«„d in cell line, of cteniibly ,imil., orioTn , ie, X?.'.? t! "^''"^ """" *»9'»'" "> ' '9- ' »howino oen. tiu...,. k . 
genes th.t wt.t e.p,»«ed ir. mosi (eok,emi.s(e,ived «n« ,h™! ® " i" the leul..emi,Mleri,ed <ell uL^i^llJ, "'^hed 

tiOTO cluiter together), b. Cluster of genes hiohlv #.b,..I.7 !. , " •«P'e»»*d e.clutively in the eryroblastoid line rSH distinguish 
set of genes w...,.„,„o;e,..e.ye.p'«^ir:''J^,^:^^^^^^^^ 

<erH*e,i«d lines, c. Cluster of genes highly e.pressed in ml« ' " 'i""* ~"-""»"-<»IMung (<'6) lines, bu, v.."e.o!e„' '« '2^>- ^^i. 
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immunoglobulin antibodies; data not shown). Therefore dis 
tina sets of genes with co-varying expression among the samples 
(Fig. 4. arrow) appear to represem distinct cell types that can be 
disunguished in breast cancer tissue. A founh cluster of genes 
more highly expressed in all of the ceD lines than in any of the 
clinical specimens, was enriched for genes present in the prolif- 
erauon cluster described above (Fig. Sd). The variation in 
expression of these genes likely paralleled the difference in prolif- 
eration rate between the rapidly cycling cultured ceU lines and the 
much more slowly dividing cells in tissues. 

Discussion 

Newly available genomics tools allowed us to explore variation in 
gene expression on a genomic scale in 60 cell Unes derived from 
diverse tumour tissues. We used a simple cluster analysis to iden- 
tify the prominent features in the gene expression patterns that 
appeared to reflect 'molecular signatures' of the tissue from 
which the cells originated. The histological characteristics of the 
cell hnes that dominated the clustering were pervasive enoueh 
that similar relationships were revealed when alternative sub$«s 
of genes were seleaed for analysis. Additional features of the 
«press.on pattern may be related to variation in physiological 
attributes such as proliferation rate and activity of interferon- 
response pathways. -^''nun 

The properties of the tumour-derived cell lines in this study 
have presumably all been shaped by selection for resistance to 
host defences and chemotherapeutics and for rapid proliferation 
in the tissue culture environment of synthetic growth media, fetal 
bovine serum and a polystyrene substratum. But the primary 
Identifiable factor accounung for variation in gene expression 
patterns among these 60 cell lines was the identity of the tissue 
from which each cell line was ostensibly derived. For most of the 
cell lines we examined, neither physiological nor experimental 
adaptation for growth in culture was sufficient to overwrite the 
gene «pr«sion programs established during differentiation in 
VIVO. Nevertheless, the prominence of mesenchymal features in 
the cell lm« isolated from glioblastomas and carcinomas may 
reflea a selection for the relative ease of establishment of cell 
lines expressing stromal charaaeristics. perhaps combined with 
physiological adaptation to tissue culture conditions^*-*". 
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cnce. vecmeoi. . lymph node mtun^h tnm^tZ^Z^ 

.nd .he NC.60 b,e.« .nd ^'^^^'^^.^trSZlZT^^^^^^J^ 

9ene..p,e«ionclu„.„„.iy.i,m8.T «<»t«>9u»h«J by the 



Biotogical hemes bnking genes with related expression pat- 

«rnsmay be infenred in many cases fh>m the sha^attribw^o^ 
known genes withm the clusters. Uncharaaerized cDNAs are 
K ""de proteins that have roles similar to thweof 5^ 
known gene products with which they appear to be co-^ tS 
Still, for several clusters of genes, we were unable to discei^^com- 
mon theme linkmg the identified memben of the di^L F^er 
«plo ation of their variation in expression under more 

o^ o reNn«n°'n"'"P'''''"f • '"^"'ig^^io" of the physiol- 
.h^o/n ""^ P"^'*** ^eht">. The reUtionship of 

sured by the DTP is an example of linking variation in gene 
expression with more subde and diversephenotypicvariauW'^^ 
i he patterns of gene expression measured in the NCieo ceU 

«ore«°r / ^""T''' '''•P* t^at 
express specific sets of genes in the histologicaUy complex breast 

cancer specimens- Although it is now felsible'to Z^'gen 

expression m m.cro-dissected tumour specimens««. thhobser- 

yation suggests tha, it will be possibles explore and inte^r^t 

some of the biology of clinical tumour sampl« by sampUne them 

^actAsisu^f^lj^convemionalmorphoWSt^^^^^^^ 
nught be able to observe interaaions between a tlimour K 
microenvironment ,n this way. These relationships will be dan- 
fied by suitable analysis of gene expression patterns from intac as 
well as dissected tumours'^ '^-'*'*!. 

Methods 

Ks used m ihese expenmenis « bacterial colonies in 96.well microiitre 
plates'. Approx.ma.ely 8.000 distine. Unigene clusten (represent^™" 
naUy unique genes) were represented in ihis se. of don.s.'l^en^'iS™™ 

Sb v*' ' "T" '"dependen. cDNA clon« 

ostensibly representing the same gene had nearly idenlical gene exDre^ion 
pattern. A single-pass 3 sequence re-verificajn was aul^'ed f™ 
clone after re-s.realong for single colonies. For a subset of genes (or ^ 

LiZT °^ 5- sequence «rifi«.ion 

ZT^"°JT P^""" °f «P'~"n (888 total). 331 w^ ^r" 

rec. y identified. 57. incorrecUy identified, and 500. indeterminateTl^r 

3 000 clones have been verified. The fuD list of clones used and their nomi- 
(S«nfo"rd',H "T'"^ """" <««igna.ion "SI^- 

IS h . ? have no. ye. been 

verified; hitp://genome-www.sianford.edu:8000/nci60). 

Production of cDNA microarr.ys. The arrays used in this experiment were 
produced at Synteni Inc. (now Incyte Pharmaceuticals). Each inse . w 
amplified from , bacterial colony by sampling 1 Ml of bacterial med a" d 

the three plasm.ds represented in the clone set (S -TTCTAAAACGACr 
GCCACTC-3-. 5 -CACACAGCAAACAGCTATGl3 )^^I'^p,'S^S 
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3XSSC (10 MI). Tie PGR product, were U,e„ pri„.ed on .reaTd S» 
nucroicope dute, «mg a robot wiO, four pHnting up^ DetaUed protocols 
for isstmUmt »P<"«jnf • nucroarray primer, and printing and exper- 

ro'iXpr^r ''^^ """"'"^ 

ftepanrtion of tnRNA and r<fe.o,<r pool. Cdt line, «r« grown from NO 
OTfrctten..odc,in RPMI.lMO«,pplemen.^ 
UmM)a™l5% fetal calf sertj™.Tomi™™i„U,econm^^^^^^ 
in culttire condmom or cell density to differential gene expression, we g^w 
ceD line to 80% confluena and isolated mRNA 24 h after uZ«^o 
froh medium. The tinie between removal from the incubator and lysis of the 
cdb m RNAsubibxauon buffer was minimized «I min). Cells were lyX 
buffer containmg guanidium isothiocyanale and total RNA was oMrifiJ 
with the RNeasy purification kit (Qiagen). We purifitJiS^'LSel 



using a poly(A) purification kit (Oligotcx. Oiaeml ^rnMJn- . .w 

mtegnty and relattve contaminaUon of mRNA with nWmal rJJT^ 

J he breasi lumourj were surcicaUv cirii*<i j 
transported to the pa^ology i^li^orSr^TrnXT m"^^ 
analyse were q„,ckly fro»n in liquid nitrogen and stored at 
me. A frozen tumour specunen was removed from the freerer. cut^to 

small pieces (-50-100 mg each), immediately placed into I(M^ mhif Tri 

zol reagent (GiTko-BRI.) and homogenized Hg a 

Homogenuer (Fisher Scientific), starting at SCMO rom ^ .v 

increasing to -20.000 r.p.m. o J, period'of 3o!^. W^ p^^^'j^ 

zol/iumour homogenate as desoibed in the Tri«.l 

initial step to remove fat. Once totS ' f "''l'^"* " 
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We combined mRNA from the foDowing cells in equal quantities to 
make the reference pool: HL-M (acute myeloid leukaemia) and K562 
(chrome myeloid leukaemia); UO-HUS (non-smaU-ccll-lung); COLO 
205 (colon); SNB-I9 (central nervous system); LOX-IMVI (melanoma); 

Z^Vu\^^^^^ ^ ^"'"^^ ^"^'^ PC-^ ^nd 

MCF7 and Hs57aT (breast). The aiierion for selection of the cell lines in 
the reference are described in detail in the accompanying manuscript 

Doubling-time caJculaUons. We calculated doubling times based on rou- 
tine NOW ceD hne compound screening date; and they reflect the dou- 
bling umes for cells inoculated into 96- well plates at the screening inocula- 
non densities and grown in RPMI 1640 medium supplemented with 5% 
fttal bovine serum for 48 h. We measured cell populations using sulforho- 
damme B opucal density measurement assay. The doubling time constant k 
was calculated using the equation: N/No = where No is optical density 
for control (untreated) cells at time zero, N is optical density for coniroi cells 
after 48.h incubation, and t is 48 h. The same equation was then used with the 
derived k to calculate the doubling time t by setting N/No = 2. For a given ccD 
line, we obtained No and N values by averaging optica] densities (N>6 000) 
obtained for each ceD line for a year $ screening. Data and experimeniaj d«ails 
are available (hnpy/dtp.nci. nih.gov) 



« h tp://r,n, sunford edu/wftware). Each .pot ^ defined by n»„uj 
po . nomng of a gr.d o cird« over U,e i„„ge. For each 
mage U,e average pueJ m.emity within each circle wa, detemuned. .nd . 
local background wa, computed for each .pot «,uj lo the medi«, pM 
mtensMy y, .square of 40 pixeU in width and height centred on the »ot 
centre, excluding all pixel, within any defined spott Net .ignal w« dewr- 
mmed by .ubtiact.on of this local background from the a^irage intensity 
for each spot. Spots deemed unsuitable for accurate quanutation because 
of array anefacts were manually flagged and excluded fh>m fiirther analy- 
«s Data files generated by ScanAly« were entered into a custom databJe 
■hat maintams web-accessible files. Signal intensities between the two fluo- 
iZ"'. ""^" were no™aii„d by applying a uniform scale factor .o aU 
ntens...es measured for the Cy5 chamiel. The normalization factor was 
chosen so that the mean log{Cy3/Cy5) for a subset of spots that achieved a 
mmimum quality parameter (approximately 6,000 spots) was 0. TTii, effec- • 



Preparation and hybridization of fluorescent labelled cDNA. For each 
comparative array hybridization, labeUed cDNA was synthesized by reverse 
transcription from test cell mRNA in the presence of Cy5-dUTP and from 
the reference mRNA with Cy3-dUTP. using the Superscript II reverse-tran- 
scriptjon kit (Gibco-BRL). For each reverse transcription reaction. mRNA 
(2 Mg) was mixed with an anchored oligo-dT (d.20T-d(AGC)) primer (4 
Mg) m a total volume of 15 pi. heated to 70 'C for 10 min and cooled on ice 
To this sample, we added an unlabeled nucleotide poo) (0 6 ul- 2S mM 
eachdATT dCTP. dGTP, and 15 mM dTTP). either Cy3 or Cy5 co'njug^ed 
dUTP (3 Ml: 1 mM; Amersham). Sxfirst-strand buffer (6 ul: 250 mM TrU 
HCX. pH 8.3. 375 mM KQ. 15 mM MgQ,), O.J M D7T (3 mD and 2 ul of 
^^^l^^^f! " '"^"^^ transcriptase (200 H/mI). After a 2.h incubation at 42 
Q Oie RNA was degraded by adding I N NaOH ( 1 .5 mI) and incubating ai 
70 C for 10 mm. The mixture was neutralized by adding of 1 N HCL (1 5 
Ml), and the volume brought to 500 pi with TE { 10 mM Tris. I mM EDTAl 
We added Coil human DNA (20 Mg; Gibco-BRL). and purified the probe 
by centrifugation in a Centricon-30 micro-concentrator (Amicon) The 
two separate probes were combined, brought to a volume of 500 Ml and 
concentrated again to a volume of less than 7 Ml. We added 10 ue/ul 
poJy(A) RNA (1 Ml; Sigma) and tRNA (10 Mg/Ml; Gibco-BRL) were added 
and adjusted the volume to 9.5 Ml with distilled water. For final prob«i 
preparation. 20xSSC (2.1 mI: 1.5 M NaQ. 150 mM NaGtrate. pH 8 0) and 
10% SDS (0.35 Ml) were added to a total final volume of 12 Ml. The probes 
were denatured by heating for 2 min at 100 "C. incubated at 37 »C for 
20-30 mm. and placed on the array under a 22 mmx22 mm glass coverslip 
We incubated slides overnight at 65 »C for 14-18 h in a custom slide cham- 
ber with humidity maintained by a small reservoir of 3xSSC. Arrays were 
washed by submersion and agitation for 2-5 min in 2xSSC with 0 1% SDS 
followed by \ySSC and then O.lxSSC The arrays were "spun dry" bv cen- 
trifugation for 2 min in a slide-rack in a Beckman GS-6 tabletop cenirifu2e 
in Microplus carriers at 650 r.p.m. for 2 min. 



Cluster an.l),.s. We extracted tables (rows of genes, columm of individual 
mtcrcarray hybndaafons) of normalized fluorescence ratios fmm the data- 
base. Various selecuon criteria, discussed in relation to each data set. were 
appbed ,0 select subseu of genes from the 9.703 cDNA elemotts on the 
arrays. Before dustermg and display, the logarithm of the measured fluores- 
cence rauos for each gene were centred bysubtraoing thearithmeiic mean of 
U ranos measured for that gene. TTie centring make, all subsequent analy« 
mdependem of theamoun. of each gene's mRNA in the reference pool. 

We applied a hierarchical clustering algorithm separately to the cell lines 
and genes using the Pearson correlation coefficiem as the measure of .imi- 
arity and average linkage clustering'-'"-. The results of this pro^ 
.wo dendrograms (trees), one for the ceD lines and one for the genes. 
which very similar elements are connected by short branches, and longer 
b anchesjom elements with diminishing degrees of simUarity. For visual 
display the rows and columns in the initial data table were reordered to 

!n^lv.?p'° I. n °' <'"'1™8""» obuined from the cluster 

analysis Each cHl in the cluster-ordered data table was replaced by a graded 
olour (pure red through black to pure green), representing the mean- 

played here only for genes that were represented in the microarray by 
sequence-ver. ed cDNAs. A complete software implementation K 
process is available (hitp://rana.stanford.edu/sof.ware). as weU as aU clus- 
tering results (http://genome-www.slanford.edu/nci60). 



Array quantitation and data processing. Following hybridization, arrays 
were scanned using a laser-scanning microscope (lef. 17; http //cmgm 
stanford.edu/pbro*m). Separate images were acquired for Cy3 and CyS We 
carried out data reduction with the program ScanAlyze (M.B.E. available 
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differentially expressed genes in healthy and diseased subjects 

Cross Reference to Related Applications: 
5 This application is a continuation-in-part application of U.S. Serial No. 

08/195,485 filed February 14, 1994, the contents of which are incorporated herein by 
reference. 

Field of the Invention 

10 The present invention relates to the use of immobilized 

oligonucleotide/polynucleotide or polynucleotide sequences for the identification, 
sequencing and characterization of genes which are implicated in disease, infection, 
or development and the use of such identified genes and the proteins encoded thereby 
in diagnosis, prognosis, therapy and drug discovery. 

15 

Background of the Invention 

Identification, sequencing and characterization of genes, especially 
human genes, is a major goal of modem scientific research. By identifying genes, 
determining their sequences and characterizing their biological function, it is possible 

20 to employ recobinant DNA technology to produce large quantities of valuable "gene 
products", e,g,, proteins and peptides. Additionally, knowledge of gene sequences 
can provide a key to diagnosis, prognosis and treatment of a variety of disease states 
in plants and animals which are characterized by inappropriate expression and/or 
repression of selected gene(s) or by the influence of external factors, e.g., carcinogens 

25 or teratogens, on gene function. The term disease-associated genes(s) is used herein 
in its broadest sence to mean not only genes associated with classical inherited 
diseases, but also those associated with genetic predisposition to disease as well as 
infectious or pathogenic states resulting from gene expression by infectious agents or 
the effect on host cell gene expression by the presence of such a pathogen or its 

30 products Locating disease-associated genes will permit the development of 
diagnostic and prognostic reagents and methods, as well as possible therapeutic 
regimens, and the discovery of new drugs for treating or preventing the occurrence of 
such diseases. 

Methods have been described for the identification of certain novel 
35 gene sequences, referred to as Expressed Sequence Tags (EST) [see, e.g., Adams et 
al, Science . 252:1651-1656 (1991); and International Patent Application No. 
WO93/00353, published January 7, 1993]. Conventially, an EST is a specific cDNA 
polynucleotide sequence, or tag, about 150 to 400 nucleotides in length, derived from 
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a messenger RNA molecule by reverse transcription, which is a marker for, and 
component of, a human gene actually transcribed in vivo. However, as used herein an 
EST also refers to a genomic DNA fragment derived from an organism, such as a 
microorganism.the DNA of which lacks intron regions, 
5 A variety of techniques have been described for identifying particular 

gene sequences on the basis of their gene products. For example, several techniques 
are described in the art [see, e.g.. International Patent Application No. W09 1/07087, 
published May 30, 1991]. Additionally, known methods exist for the amplification of 
desired sequences [see, e.g.. International Patent Application No. W091/17271, 

10 published November 14, 199 1 , among others]. 

However, at present, there exist no established methods for filling the 
need in the an for methods and reagents which employ fragments of differentially 
expressed genes of known, unknown (or previously unrecognized ) function or 
consequence to provide diagnostic and therapeutic methods and reagents for diagnosis 

15 and treatment of disease or infection, which conditions are characterized by such 
genes and gene products. It should be appreciated that it is the expression differences 
that are diagnostic of the altered state (e.g., predisease, disease, pathogenic, 
progression or infectious). Such genes associated with the altered state are likely to 
be the targets of drug discovery, whether the genes are the cause or the effect of the 

20 condition, identification of such genes provides insight into which gene expression 
needs to be re-altered in order to reestablished the healthy state. 

Summary of the Invention 

In one aspect, the invention provides methods for identifying gene(s) 

25 which are differentially expressed, for example, in a normal healthy organism and an 
organism having a disease. The method involves producing and comparing 
hybridization patterns formed between samples of expressed mRNA or cDNA 
polynucleotide sequences obtained from either analogous cells, tissues or organs of a 
healthy organism and a diseased organism and a defined set of 

30 oligonucleotide/polynucleotide/polynucleotide sequence probes from either an 
healthy organism or a diseased organism immobilized on a support. Those defined 
oligonucleotide/polynucleotide sequences are representative of the total expressed 
genetic component of the cells, tissues, organs or organism as defined the collection 
of partial cDNA sequences (ESTs). The differences between the hybridization 

35 patterns permit identification of those particular EST or gene-specific 
oligonucleotide/polynucleotide sequences associated with differential expression, and 
the identification of the EST permits identification of the clone from which it was 



2 



wo 95/21944 



PCT/US95/01863 



derived and using ordinary skill further cloning and, if desired, sequencing of the full- 
length cDNA and genomic counterpart, i.e., gene, from which it was obtained. 

In another aspect, the invention provides methods substantially similar 
to those described above, but which permit identification of those gene(s) of a 
5 pathogen which are expressed in any biological sample of an infected organism based 
on comparative hybridization of RNA/cDNA samples derived from a healthy versus 
infected organism, hybridized to an oligonucleotide/polynucleotide set representative 
of the gene coding complement of the pathogen of interest. 

In another aspect, the invention provides methods substantially similar 

10 to those described above, but which permit identification of those ESTs-specific 
oligonucleotide/polynucleotide sequences of host gene(s) which represent genes being 
differentially expressed/ altered in expression by the disease state, or infection and are 
expressed in any biological sample of an infected organism based on comparative 
hybridization of RNA/cDNA samples derived from a healthy versus infected 

15 organism of interest. 

In a further aspect, the methods described above and in detail below, 
also provide methods for diagnosis of diseases or infections characterized by 
differentially expressed genes, the expression of which has been altered as a result of 
infection by the pathogen or disease causing agent in question. All identified 

20 differences provide the basis for diagnostic testing be it the altered expression of 
endogenous genes or the patterned expression of the genes of the infecting organism. 
Such patterns of altered expression are defined by comparing RNA/cDNA from the 
two states hybridized against a panel of oligonucleotide/polynucleotides representing 
the expressed gene component of a cell, tissue, organ or orgaiusm as defined by its 

25 collection of ESTs. 

Yet a further aspect of this invention provides a composition suitable 
for use in hybridization, which comprises a solid surface on which is immobilized at 
pre-defined regions thereon a plurality of defined oligonucleotide/polynucleotide 
sequences for hybridization, each sequence comprising a fragment of an EST isolated 

30 from a cDNA or DNA library prepared from at least one selected tissue or cell 
sample of a healthy (i.e., pre-disease state) animal, at least one analogous sample of 
an animal having a disease, at least one analogous sample of an animal infected with a 
pathogen or the pathogen itself, or any combination or multiple combinations thereof. 

An additional aspect of the invention provides an isolated gene 

35 sequence which is differentially expressed in a normal healthy animal and an animal 
having a disease, and is identified by the methods above. Similarly, an isolated 
pathogen gene sequence which is expressed in tissue or cell samples of an infected 
animal can be identified by the methods above. 
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Yet another aspect of the invention is that it provides not only a means 
for a static diagnostic but also provides a means for a carrying out the procedure over 
time to measure disease progression as well as monitoring the efficacy of disease 
treatment regimes including an toxicological effects thereof. 
5 Another aspect of the invention is an isolated protein produced by 

expression of the gene sequences identified above. Such proteins are useful in 
therapeutic compositions or diagnostic compositions, or as targets for drug 
development. 

Other aspects and advantages of the present invention are described 
10 further in the following detailed description of the preferred embodiments thereof. 

Detailed Description of the Invention 

The present invention meets the unfulfilled needs in the art by 
providing methods for the identification and use of gene fragments and genes, even 

15 those of unknown full length sequence and unknown function, which are 
differentially expressed in a healthy animal and in an animal having a specific disease 
or infection by use of ESTs derived from DNA libraries of healthy and/or 
diseased/infected aiumals. Employing the methods of this invention permits the 
resulting identification and isolation of such genes by using their corresponding ESTs 

20 and thereby also permits the production of protein products encoded by such genes. 
The genes themselves and/or protein products, if desired, may be employed in the 
diagnosis or therapy of the disease or infection with which the genes are associated 
and in the development of new drugs therefor. 

It has been appreciated that one or more differentially identified EST 

25 or gene-specific oligonucleotide/polynucleotides define a pattern of differentially 
expressed genes diagnostic of a predisease, disease or infective state. A knowledge of 
the specific biological function of the EST is not required only that the ESTs 
identifies a gene or genes whose altered expression is associated reproducibly with 
the predisease, disease or infectious state. The differences permit the identification of 

30 gene products altered in their expression by the disease and represent those products 
most likely to be targets of therapeutic intervention. Similarly, the product may be of 
the infecting organism itself and also be an effective target of intervention. 

/. Definitions. 

35 Several words and phrases used throughout this specification are 

defined as follows: 

As used herein, tiie term "gene" refers to the genomic nucleotide 
sequence from which a cDNA sequence is derived, which cDNA produces an EST, as 

4 
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described below. The term gene classically refers to the genomic sequence, which, 
upon processing, can produce different cDNAs, e.g., by splicing events. However, 
for ease of reading, any full-length counterpart cDNA sequence which gives rise to an 
EST will also be referred to by shorthand herein as a *gene\ 
5 The term "organism" includes without limitation, microbes, plants and 

animals. 

The term "animal" is used in its broadest sense to include all members 
of the animal kingdoni, including humans. It should be understood, however, that 
according to this invention the same species of animal which provides the biological 
10 sample also is the source of the defined immobilized oligonucleotide^lynucleotides 
as defined below. 

The term "pathogen" is defined herein as any molecule or organism 
which is capable of infecting an animal or plant and replicating its nucleic acid 
sequences in the cells or tissues of that animal or plant . Such a pathogen is generally 

15 associated with a disease condition in the infected animal or plant. Such pathogens 
may include viruses, which replicate intra- or extra-cellularly, or other organisms, 
such as bacteria, fiingi or parasites, which generally infect tissues or the blood. 
Certain pathogens or microorganisms are known to exist in sequential and 
distinguishable stages of development, e.g., latent stages, infective stages, and stages 

20 which cause symptomatic diseases. In these different stages, the pathogens are 
anticipated to express differentially certain genes and/or turn on or off host cell gene 
expression. 

As used herein, the term "disease" or "disease state" refers to any 
condition which deviates firom a normal or standardized healthy state in an organism 

25 of the same species in terms of differential expression of the organism's genes. In 
other words, a disease state can be any illness or disorder be it of genetic or 
environmental origin , for example, an inherited disorder such as certain breast 
cancers, or a disorder which is characterized by expression of gene(s) normally in an 
inactive, 'turned off state in a healthy animal, or a disorder which is characterized by 

30 under-expression or no expression of gene(s) which is normally activated or 'turned 
on' in a normal healthy animal. Such differential expression of genes may also be 
detected in a condition caused by infection, inflammation, or allergy, a condition 
caused by development or aging of the animal, a condition caused by administration 
of a drug or exposure of the animal to another agent, e.g., nutrition, which affects 

35 gene expression. Essentially, the methods described herein can be adapted to detect 
differential gene expression resulting from any cause, by manipulation of the defined 
oligonucleotide/polynucleotides and the samples tested as described below. The 
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concept of disease or disease state also includes its temporal aspects in terms of 
progression and treatment. 

The phrase "differentially expressed" refers to those situations in 
which a gene transcript is found in differing numbers of copies, or in activated vs 
5 inactivated states, in different cell types or tissue types of an organism, having a 
selected disease as contrasted to the levels of the gene transcript found in the same 
cells or tissues of a healthy organism. Genes may be differentially expressed in 
differing states of activation in microorganisms or pathogens in different stages of 
development For example, multiple copies of gene transcripts may be found in an 

10 organism having a selected disease, while only one, or significandy fewer copies, of 
the same gene transcript are found in a healthy organism, or vice-versa. 

As used herein, the term "solid support" refers to any known substrate 
which is useful for the immobilization of large numbers of 
oligonucleotide/polynucleotide sequences by any available method to enable 

15 detectable hybridization of the immobilized oligonucleotide/polynucleotide sequences 
with other polynucleotide sequences in a sample. Among a number of available solid 
supports, one desirable example is the supports described in International Patent 
Application No. WO91/07087, published May 30, 1991.Also useful are suports such 
as but not limited to nitrocellulose, mylein, glass, silica ans Pall Biodyne C® It is 

20 also anticipated that improvements yet to be made to conventional solid supports may 
also be employed in this inyention. 

The term "surface" means any generally two-dimensional structure on 
a solid support to which the desired oligonucleotide/polynucleotide sequence is 
attached or immobilized. A surface may have steps, ridges, kinks, terraces and the 

25 like* 

As used herein, the term "predefined region" refers to a localized area 
on a surface of a solid support on which is immobilized one or multiple copies of a 
particular oligonucleotide/polynucleotide sequence and which enables the 
identification of the oligonucleotide/polynucleotide at the position, if hybridization of 
30 that oligonucleotide/polynucleotide to a sample polynucleotide occurs. 

By "immobilized" refers to the attachment of the 
oligonucleotide/polynucleotide to the solid support Means of immobilization are 
known and conventional to those of skill in the art, and may depend on the type of 
support being used. 

35 By "EST" or "Expressed Sequence Tag" is meant a partial DNA or 

cDNA sequence of about 150 to 500, more preferably about 300, sequential 
nucleotides of a longer sequence obtained from a genomic or cDNA library prepared 
from a selected cell, cell type, tissue or tissue type, organ or organism which longer 
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sequence corresponds to an mRNA of a gene found in that library. An EST is 
generally DNA. One or more libraries made from a single tissue type typically 
provide at least about 3000 different (i.e., unique) ESTs and potentially the full 
complement of all possible ESTs representing all cDNAs e.g., 50,000-100,000 in an 
5 animal such as a human. Further background and information on the construction of 
ESTs is described in M. D. Adams et al. Science , 252:1651-1656 (1991); and 
International Application Number PCTAJS92/05222 (January 7, 1993). 

As used herein, the term "defined oligonucleotide/polynucleotide 
sequence" refers to a known nucleotide sequence fragment of a selected EST or gene. 
10 This term is used interchangeably with the term "fragments of EST". These 
sequential sequences are generally comprised of between about 15 to about 45 
nucleotides and more preferably between about 20 to about 25 nucleotides in length. 
Thus any single EST of 300 nucleotides in lengtii may provide about 280 different 
defined oligonucleotide/polynucleotide sequences of 20 nucleotides in length (e.g., 
15 20-mers). The lengths of tiie defined oligonucleotide/polynucleotides may be readily 
increased or decreased as desired or needed, depending on the limitations of the solid 
support on which they may be immobilized or the requirements of the hybridization 
conditions to be cmployed.The length is generally guided by the principle that it 
should be of sufficient length to insure that it is one average only represented once in 
20 the population to be examined. Generally, these defined 

oligonucleotide/polynucleotides are RNA or DNA and are preferably derived from 
the anti-sense strand of the EST sequence or from a corresponding mRNA sequence 
to enable their hybridization witii samples of RNA or DNA. Modified nucleotides 
may be incorporated to increase stability and hybridization properties. 
25 By tiie term "plurality of defined oligonucleotide/polynucleotide 

sequences" is meant the following. A surface of a solid support may immobilize a 
large number of "defined oligonucleotide/polynucleotides". For example, depending 
upon the nature of the surface, it can immobilize from about 300 to upwards of 
60,000 defined 20-mer oligonucleotide/polynucleotides. It is anticipated that future 
30 improvements to solid surfaces will permit considerably larger such pluralities to be 
immobilized on a single surface. A "plurality" of sequences refers to the use on any 
one solid support of multiple different defined oligonucleotide/polynucleotides from a 
single EST from a selected library, as well as multiple different defined 
oligonucleotide/polynucleotides from different ESTs from the same library or many 
35 libraries from the same or different tissues, and may also include multiple identical 
copies of defined oligonucleotide/polynucleotides. Ultimately a pluarality has at least 
one oligonucleotide/polynucleotide per expressed gene in the entire organism For 
example, from a library producing about 5,000-10,000 ESTs, a single support can 
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include at least about 1-20 defined oligonucleotide/polynucleotides representing every 
EST in that library. The composition of defined oligonucleotide^olynucleotides 
which make up a surface according to this invention may be selected or designed as 
desired. 

5 The term "sample" is employed in the description of this invention in 

several important ways. As used herein, the term "sample" encompasses any cell or 
tissue from an organism. Any desired cell or tissue type in any desired state may be 
selected to form a sample. For example, the sample cell desired may be a human T 
cell; the desired cell type for use in this invention may be a quiescent T cell or an 

10 activated T cell. 

By the phrase "analogous sample" or "analogous cell or tissue" is 
meant that according to this invention when the ESTs which provide the defined 
oligonucleotide/polynucleotides are produced from a cDNA library prepared from a 
single tissue or cell type source sample, e.g., liver tissue of a human, then the samples 

15 used to hybridize to those immobilized defined oligonucleotide/polynucleotides are 
preferably provided by the same type of sample from either a healthy or diseased 
animal, i.e., liver tissue of a healthy human and liver tissue of a diseased or infected 
human or from a human suspected of having that disease or infection. Alternatively, 
if the surface contains defined oligonucleotide/polynucleotides from multiple cells or 

20 tissues, then the "samples" which are hybridized thereto can be but are not limited to 
samples obtained from analogous multiple tissues or cells. 

By the term "detectably hybridizing" means that the sample from the 
healthy organism or diseased or infected orgamsm is contacted with the defined 
oligonucleotide/polynucleotides on the surface for sufficient time to permit the 

25 formation of patterns of hybridization on the surfaces caused by hybridization 
between certain polynucleotide sequences in the samples with the certain immobilized 
defined oligonucleotide/polynucleotides. These patterns are made detectable by the 
use of available conventional techniques, such as fluorescent labelling of the samples. 
Preferably hybridization takes place under stringent conditions, e.g., revealing 

30 homologies of about 95%. However, if desired, other less stringent conditions may 
be selected. Techiuques and conditions for hybridization at selected stringencies are 
well known in the art [see, e.g., Sambrook et al, Molecular Cloning. A Laboratory 
Manual. . Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1989)]. 

35 //. Compositions of The Invention 

The present invention is based upon the use of ESTs from any desired 
cell or tissue in known technologies for oligonucleotide/polynucleotide hybridization. 
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A. ESTs 

An EST, as defined above, is for an animal, a sequence from a 
cDNA clone that corresponds to an mRNA. The EST sequences useful in the present 
invention are isolated preferably from cDNA libraries using a rapid screening and 
5 sequencing technique. Custom made cDNA libraries are made using known 
techniques. See, generally, Sambrook et al, cited above. Briefly, mRNA from a 
selected cell or tissue is reverse transcribed into complementary DNA (cDNA) using 
the reverse transcriptase enzyme and niade double-stranded using RNase H coupled 
with DNA polymerase or reverse transcriptase. Restriction enzyme sites are added to 

10 the cDNA and it is cloned into a vector. The result is a cDNA library. Alternatively, 
commercially available cDNA libraries may be used. Libraries of cDNA can also be 
generated from recombinant expression of genomic DNA using known techniques, 
including polymerase chain reaction-derived techniques. 

ESTs (which can range from about 150 to about 500 nucleotides in 

15 length, preferably about 300 nucleotides) can be obtained through sequence analysis 
from either end of the cDNA insert. Desirably, the DNA libraries used to obtain 
ESTs use directional cloning methods so that either the 5* end of the cDNA Oikely to 
contain coding sequence) or the 3* end (likely to be a non-coding sequence) can be 
selectively obtained. 

20 In general, the method for obtaining ESTs comprises applying 

conventional automated DNA sequencing technology to screen clones, 
advantageously randomly selected clones, from a cDNA library. The cDNA libraries 
from the desired tissue can be preprocessed, or edited, by conventional techniques to 
reduce repeated sequencing of high and intermediate abundance clones and to 

25 maximize the chances of finding rare messages from specific cell populations. 
Preferably, preprocessing includes the use of defined composition prescreening 
probes, e.g., cDNA corresponding to mitochondria, abundant sequences, ribosomes, 
actins, myelin basic polypeptides, or any other known high abundance peptide. These 
prescreening probes used for preprocessing are generally derived from known ESTs. 

30 Other useful preprocessing techniques include subtraction hybridization, which 
preferentially reduces the population of highly represented sequences in the library 
[e.g., see Fargnoli et al. Anal. Biochem. . 152:364 (1990)] and normalization, which 
resuhs in all sequences being represented in approximately equal proportions in the 
library [Patanjali et al, Proc. Natl. Aca d. Sci. USA. &a:1943 (1991)]. Additional 

35 prescreening/differential screening approaches are known to those skilled in the art. 

ESTs can then be generated from partial DNA sequencing of the 
selected clones. The ESTs useful in the present invention are preferably generated 
using low redundancy of sequencing, typically a single sequencing reaction. While 



wo 95/21944 



PCT/US95/01863 



single sequencing reactions may have an accuracy as low as 90%, this nevertheless 
provides sufficient fidelity for identification of the sequence and design of PGR 
primers. 

If desired, the location of an EST in a full lengtii cDNA is determined 

5 by analyzing the EST for the presence of coding sequence. A conventional computer 
program is used to predict the extent and orientation of the coding region of a 
sequence (using all six reading ftames). Based on this information, it is possible to 
infer the presence of start or stop codons within a sequence and whether the sequence 
is completely coding or completely non-coding or a combination of the two. If start 

10 or stop codons are present, then the EST can cover both part of the 5'-untranslated or 
3 -untranslated part of the mRNA (respectively) as well as part of the coding 
sequence. If no coding sequence is present, it is likely that the EST is derived from 
the 3' untranslated sequence due to its longer length and the fact that most cDNA 
library construction methods are biased toward the 3' end of the mRNA. It should be 

15 understood that both coding and non-coding regions may provide ESTs equally useful 
in the described invention, 

A number of specific ESTs suitable for use in die present 
invention are described above Adams et al (supra), which may be incorporated by 
reference herein, to describe non-essential examples of desirable ESTs. Other ESTs 

20 exist in the art which may also be useful in this invention, as will ESTs yet to be 
developed by these known techniques. 

B. Preparing the Solid Support of the Invention 

Oligonucleotide sequences which are fragments of defined 
sequence are derived from each EST by conventional means, e.g., conventional 

25 chemical synthesis or recombinant techniques. Each defined 

oligonucleotide^lynucleotide sequence as described above is a fragment, can be, but 
is not necessarily an anti-sense fragment, of an EST isolated from a DNA library 
prepared from a selected cell or tissue type from a selected animal. For use in the 
present invention, it is presentiy preferred that the defined 

30 oligonucleotide/polynucleotide sequences are 20-25mers. As described above, for 
each EST a number of such 20-25mers may be generated. The lengtiis may vary as 
described above as well as the composition. For example 
oligonucleotide/polynucleotides can be modified based on die Oligo 4.0 or simiolar 
programs to predict hybridization potential or to include modifieid nucleotides for the 

35 reasons given above. It is alos appreciated that large DNA segments may be 
employed including entire ESTs or even full length genes particular when inserted 
into cloning vectors. 
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A plurality of these defined oligonucleotide/polynucleotide 
sequences are then attached to a selected solid support conventionally used for the 
attachment of nucleotide sequences again by known means. In contrast to other 
technologies available in the art, this support is designed to contain defined, not 
5 random, oligonucleotide/polynucleotide sequences. The EST fragments, or defined 
oligonucleotide/polynucleotide sequences, immobilized on the solid support can 
include fragments of one or more ESTs from a library of at least one selected tissue 
or cell sample of a healthy animal, at least one analogous sample of the animal having 
a disease, at least one analogous sample of the animal infected with a pathogen, and 

10 any combination thereof. 

Numerous conventional methods are employed for attaching 
biological molecules such as oligonucleotide/polynucleotide sequences to surfaces of 
a variety of solid supports. See, e.g., Affinitv Techniques. Enzvme Purification: Part 
B. Methods in Enzvmology . Vol. 34, ed. W.B. Jakoby, M. Wilcheck, Acad. Press, 

15 NY (1974); ln]tmobiili?;gd Bioghemigals and Affinity ChrgniatQgTaphy, Advangcs in 

Experimental Medicine and Biologv . vol. 42, ed. R. Dunlap, Plenum Press, NY 
(1974); U. S. Patent No. 4,762,881; U. S. Patent No. 4,542,102; European Patent 
PubUcation No, 391,608 (October 10, 1990); U. S. Patent No. 4,992,127 (Nov. 21, 
1989). 

20 One desirable method for attaching 

oligonucleotide/polynucleotide sequences derived fi'om ESTs to a solid support is 
described in International Application No. PCTAJS90/06607 (published May 30. 
1991). Briefly, this method involves forming predefined regions on a surface of a 
solidsupport, where the predefined regions are capable of immobilizing ESTs. The 

25 methods make use of binding substances attached to the surface which enable 
selective activation of the predefined regions. Upon activation, these binding 
substances become capable of binding and immobilizing 
oligonucleotide/polynucleotides based on EST or longer gene sequences. 

Any of the known solid substrates suitable for binding 

30 oligonucleotide/polynucleotides at pre-defined regions on the surface thereof for 
hybridization and methods for attaching the oligonucleotide/polynucleotides thereto 
may be employed by one of skill in the art according to this invention. Similarly, 
known conventional methods for making hybridization of the immobilized 
oligonucleotide/polynucleotides detectable, e.g., fluorescence, radioactivity, 

35 photoactivation, biotinylation, solid state circuitry, and the like may be used in this 
invention. 

Thus, by resorting to known techniques, the invention provides 
a composition suitable for use in hybridization which consists of a surface of a solid 
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support on which is immobilized at pre-defined regions on said surface a plurality of 
defined oligonucleotide/polynucleotide sequences for hybddizadon. For example, 
one composition of this invention is a solid support on which are immobilized oligos 
of EST fragments from a library constructed from a single cell type, e.g., a human 
5 stem cell, or a single tissue, e.g., human liver, from a healthy human. Still another 
composition of this invention is another solid support on which are immobilized 
oligos of EST fragments from a library constructed from a single cell type or a tissue 
from a human having a selected disease or predispositon to a selected disease, e.g., 
liver cancer. 

10 Another embodiment of the compositions of this invention 

include a single solid support having oligonucleotides of ESTs from both single cell 
or single tissue libraries from both a healthy and diseased human. Still other 
embodiments include a single support on which are immobilized oligos of EST 
fragments from more than one tissue or cell library from a healthy human or a single 

15 support on which are immobilized more than one tissue or cell library from both 
healthy and diseased animals or humans. A preferred composition of this invention is 
anticipated to be a single support containing oligos of ESTs for all known cells and 
tissues from a selected organism. 

20 ///. The Methods of the Invention 

A. Identification of Genes 

The present invention employs the compositions described 
above in methods for identifying genes which are differentially expressed in a normal 
healthy organism and an organism having a disease or infection. These methods may 

25 be employed to detect such genes, regardless of the state of knowledge about the 
function of the gene. The method of this invention by use of the compositions 
containing multiple defined EST fragments from a single gene as described above is 
able to detect levels of expression of genes or in other cases simply the expression or 
lack thereof, which differ between normal, healthy organisms and organisms having a 

30 selected disease, disorder or infection. 

One such method employs a first surface of a solid support on 
which is immobilized at pre-defined regions thereon a plurality of defined 
oligonucleotide/polynucleotide sequences, described above, of ESTor longer gene 
fixigmeni isolated from a cDNA library prepared from at least one selected tissue or 

35 cell sample of a healthy animal (the "healthy test surface") and a second such surface 
on which is immobilized at pre-defined regions a plurality of defined 
oligonucleotide/polynucleotide sequences of ESTor longer gene fragment isolated 
from at least one analogous tissue of an animal having a selected disease (the "disease 
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test surface"). These test surfaces may be standardized for the selected animal or 
selected cell or tissue sample from that animal (i.e., they are prescreened for 
polymorphisms in the species population). 

Polynucleotide sequences are then isolated from mRNA and/or 
5 cDNA from a biological sample from a known healthy animal ("healthy control") and 
a second sample is similarly prepared from a sample from a known diseased animal 
("disease sample"). These two samples are desirably selected from the cell or tissue 
analogous to that which provided the immobilized oligonucleotide^olynucleotides. 

According to the method the healthy control sample is 

10 contacted with one set of the healthy test surface and the disease test surface 
described above for a time sufficient to permit detectable hybridization to occur 
between the sample and the immobilized defined oligonucleotide/polynucleotides on 
each surface. The results of this hybridization are a first hybridization pattern formed 
between the nucleotides of healthy control and the healthy test surface and a second 

15 hybridization pattern formed between the nucleotides of healthy control sample and 
the disease test surface. 

In a similar manner, the disease sample is detectably hybridized 
to another set of healthy test arid disease test surfaces, forming a third hybridization 
pattern between the disease sample and healthy test surface and a fourth hybridization 

20 pattern between the disease sample and the disease test surface. 

Comparing the four hybridization patterns permits detection of 
those defined oligonucleotide/polynucleotides which are differentially expressed 
between the healthy control and the disease sample by the presence of differences in 
the hybridization patterns at pre-defined regions. The 

25 oligonucleotide/polynucleotides on each surface which correspond to the pattern 
differences may be readily identified with the corresponding ESTor longer gene 
fragment from which the oligonucleotide/polynucleotides are obtained. 

In another embodiment of the method of this invention, the 
same process is employed, with the exception that plurality of defined 

30 oligonucleotide/polynucleotide sequences forming the healthy test sample and the 
disease test sample surfaces are immobilized on a single solid support. For example, 
each fragment of an EST or longer gene fragment on the surface is isolated from at 
least two cDNA libraries prepared from a selected cell or tissue sample of a healthy 
animal and an analogous selected cell or tissue sample of an animal having a disease. 

35 According to this embodiment, the healdiy control sample is 

detectably hybridized to a copy of this single solid surface, forming one hybridization 
pattern with oligonucleotide/polynucleotides associated with both the healthy and 
diseased animal. Similarly, the disease sample is detectably hybridized to a second 
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copy of this single solid surface, forming one hybridization pattern with 
oligonucleotide/polynucleotides associated with both the healthy and diseased animal. 

Comparing the two hybridization patterns permits detection of 
those defined oligonucleotide/polynucleotides which are differentially expressed 
5 between the healthy control and the disease sample by the presence of differences in 
the hybridization patterns at pre-defined regions. The 
oligonucleotide/polynucleotides on each surface which correspond to the pattern 
differences may be readily identified with the corresponding ESTor longer gene 
fragment from which the oligonucleotide/polynucleotides are obtained. 

10 The identification of one or more ESTs as the source of the 

defined oligonucleotide/polynucleotide which produced a "difference" in 
hybridization patterns according to these methods permits ready identification of the 
gene from which those ESTs were derived. Because oligonuleotides are of sufficient 
length that they will hybridize under stringent conditions only with a RNA/cDNA for 

15 that gene to which they correspond, the oligo can be used to identify the EST and in 
turn the clone from which it was derived and by subsequent cloning, obtain the 
sequence of the full-length cDNA and its genomic counterparts, i.e., the gene, from 
which it was obtained. 

In other words, the ESTs identified by the method of this 
20 invention can be employed to determine the complete sequence of the mRNA, in the 
form of transcribed cDNA, by using the EST as a probe to identify a cDNA clone 
corresponding to a full-length transcript, followed by sequencing of that clone. The 
EST or the full length cDNA clone can also be used as a probe to identify a genomic 
clone or clones that contain the complete gene including regulatory and promoter 
25 regions, exons, and introns. 

It should be appreciated that one does not have to be restricted 
in using ESTs from a particular tissue from which probe RNA or cDNA is obtained, 
rather any or all ESTs (known or unknown) may be placed on the support. 
Hybridization will be used a form diagnostic patterns or to identifiy which particular 
30 EST is detected. For example, all known ESTs from an organism are used to produce 
a "master" solid support to which control sample and disease samples are alternately 
hybridized. One then detects a pattern of hybridization associated with the particular 
disaease state which then forms the basis of a diagnostic test or the isolation of 
disease specific ESTs from which the intact gene may be cloned and sequenced 
35 leading uiitimately to a defined therapuetic target. 

Methods for obtaining complete gene sequences from ESTs are 
well-known to those of skill in the art. See, generally, Sambrook et al, cited above. 
Briefly, one suitable method involves purifying the DNA from the clone tiiat was 
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sequ need to give the EST and labeling the isolated insert DNA. Suitable labeling 
systems are well known to those of skill in the art [see, eg. Basic Methods in 
Molecular Biology, L, G. Davis et al, ed., Elsevier Press, NY (1986)]. The labeled 
EST insert is then used as a probe to screen a lambda phage cDNA library or a 
5 plasmid cDNA library, identifying colonies containing clones related to the probe 
cDNA which can be purified by known methods. The ends of the newly purified 
clones are then sequenced to identify full length sequences and complete sequencing 
of full length clones is performed by enzymatic digestion or primer walking. A 
similar screening and clone selection approach can be applied to clones fi'om a 
10 genomic DNA library. 

Additionally, an EST or gene identified by this method as 
associated with inherited disorders can be used to determine at what stage during 
embryonic development the selected gene from which it is derived is developed by 
screening embryonic DNA libraries from various stages of development, e.g. 2-cell, 
15 8-cell, etc., for the selected gene. As has been mentioned above, the invention may 
be applied in addtional temporal modes for monitoring the progression of a disease 
state, the efficacy of a particular treatment modality or the aging process of an 
individual. 

Thus, the methods of this invention permit the identification, 
20 isolation and sequencing of a gene which is differentially expressed in a selected 
disease^nfection. As described in more detail below, the identified gene may then be 
employed to obtain any protein encoded thereby, or may be employed as a target for 
diagnostic methods or therapeutic approaches to the treatment of the disease, 
including, e.g., drug development. 
25 The same methods as described above for the identification of 

genes, including genes of unknown function, which are differentially expressed in a 
disease state, may also be employed to identify other genes of interest. For example, 
another embodiment of this invention includes a method for identifying a gene of a 
pathogen which is expressed in a biological sample of an animal infected with that 
30 pathogen or die gene of the host which is altered in its expression as a result of the 
infection. 

One such method employs a healthy test surface as described 
above, employing defined oligonucleotide/polynucleotides fi'om a sample of a 
healthy, uninfected animal. The second such siuface has immobilized at pre-defined 
35 regions thereon a plurality of defined oligonucleotide/polynucleotide sequences of 
ESTs isolated from at least one analogous tissue or cell sample of an infected animal 
(the "infection test surface"). Polynucleotide sequences are isolated from a biological 
sample from a healtiiy animal ("healthy control") and a second sample is similarly 
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prepared from an animal infected with the selected pathogen ("infection sample"). 
These two samples are desirably selected from the cell or tissue analogous to that 
which provided the immobilized oligonucleotide/polynucleotides. It would also be 
possible to provide samples from the nucleic acid of the pathogen itself. 

5 According to the method the healthy control sample is 

contacted with one set of the healthy test surface and the infection test surface 
described above for a time sufficient to permit detectable hybridization to occur 
between the sample and the immobilized defined oligonucleotide/polynucleotides on 
each surface. The results of this hybridization are a first hybridization pattern formed 

10 between the nucleotides of healthy control and the healthy test surface and a second 
hybridization pattern formed between the nucleotides of healthy control sample and 
the infection test surface. 

In a similar manner, the infection sample is detectably 
hybridized to anotiier set of healthy test and infection test surfaces, forming a third 

15 hybridization pattern between the infection sample and healthy test surface and a 
fourth hybridization pattern between the infection sample and the infection test 
surface. 

Comparing the four hybridization patterns permits detection of 
those defined oligonucleotide/polynucleotides which are differentially expressed 

20 between the healthy animal and the animal infected with the pathogen by the presence 
of differences in the hybridization patterns at pre-defined regions. As mentioned 
differential expression is not required and simple qualitative analysis is possible by 
refeience to gene expression which is simply present or absent. 

A second embodiment of this method parallels the second 

25 embodiment of the method as applied to disease above, i.e., the same process is 
employed, with the exception that plurality of defined oligonucleotide/polynucleotide 
sequences forming the healthy test sample surface and the infection test sample 
surface are immobilized on a single solid support. The resulting first hybridization 
pattern (healthy control sample with healthy/infection test sample) and second 

30 hybridization pattern (infection sample with healthy/infection test sample) permits 
detection of those defined oligonucleotide/polynucleotides which are differentially 
expressed between the healthy control and the infection sample by the presence of 
differences in die hybridization patterns at pre-defined regions. The 
oligonucleotide/polynucleotides on each surface which correspond to the pattern 

35 differences may be readily identified with the corresponding ESTs from which the 
oligonucleotide/polynucleotides are obtained. 

As described above for die methods for identifying differential 
gene expression between diseased and healthy animals, the 
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oligonucleotide/polynucleotides on each surface which correspond to the pattern 
differences may be readily identified with the corresponding ESTs from which the 
oligonucleodde/polynucleotide sequences are obtained and the genes expressed by the 
pathogen identified for similar purposes. Other embodiments of these methods may 
5 be developed with reson to the teaching herein, by altering the samples which provide 
the defined oligonucleotide/polynucleotides. For example, an EST, identified with a 
differentially expressed gene by the method of this invention is also useful in 
detecting genes expressed in the various stages of an pathogen's development, 
particxilarly the infective stage and following the cours of drug treatment and 
10 emergence of resistant variants. For example, employing the techniques described 
above, the EST can be used for detecting a gene in various stages of the parasitic 
Plasmodium species life cycle, which include blood stages, liver stages, and 
gametocyte stages. 

B, Diagnostic Methods 
15 In addition to use of the methods and compositions of this 

invention for identifying differentially expressed genes, another embodiment of this 
invention provides diagnostic methods for diagnosing a selected disease state, or a 
selected state resulting from aging, exposure to drugs or infection in an animal. 
According to this aspect of the invention, a first surface, described as the healthy test 
20 surface above, and a second surface, described as the disease test surface or infection 
test surface, are prepared depending on the disease or infection to be diagnosed. The 
same processes of detectable hybridization to a first and second set of these surfaces 
with the healthy control sample and disease/infection sample are followed to provide 
the four above-described hybridization patterns, i.e., healthy control sample with 
25 healthy test surface; healthy control sample with disease^wfection test surface; 
disease/infection sample with healthy test surface; and disease/infection sample with 
disease/infection test surface. 

The diagnosis of disease or infection is provided by comparing 
the four hybridization patterns. Substantial differences between the first and third 
30 hybridization patterns, respectively, and the second and fourth hybridization patterns, 
respectively, indicate the presence of the selected disease or infection in said animal. 
Substantial similarities in the first and third hybridization patterns and second and 
fourth hybridization patterns indicates the absence of disease or infection. 

A similar embodiment utilizes the single surface bearing both 
35 the healthy test surface defined oligonucleotide/polynucleotides and die 
disease/infection test surface defined oligonucleotide/polynucleoudes as described 
above. Parallel process steps as described above for detection of genes differentially 
expressed in disease and infected states are followed, resulting in a first hybridization 
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pattern (healthy control sample with single healthy and disease/infection test sample) 
and a second hybridization pattern (disease/infection sample with another copy of the 
single healthy and disease/infection test sanq)le). 

Diagnosis is accomplished by comparing the two hybridization 
5 patterns, wherein substantial differences between the first and second hybridization 
patterns indicate the presence of the selected disease or infection in the animal being 
tested. Substantially similar first and second hybridization patterns indicate the 
absence of disease or infection. This like many of the foregoing embodiments may 
use known or unknown ESTs derived from many libraries. 

10 C. Other Methods of the Invention 

As is obvious to one of skill in the art upon reading this 
disclosure, the compositions and methods of this invention may also be used for other 
similar purposes. For example, the general methods and compositions may be 
adapted easily by manipulation of the samples selected to provide the standardized 

15 defined oligonucleotide/polynucleotides, and selection of the samples selected for 
hybridization thereto. One such modification is the use of this invention to identify 
cell markers of any type, e.g., markers of cancer cells, stem cell markers, and the like. 
Another modification involves the use of the method and compositions to generate 
hybridization patterns useful for forensic identification or an 'expression fingerprint* 

20 of genes for identification of one member of a species from another. Similarly, the 
methods of this invention may be ad^ted for use in tissue matching for 
transplantation purposes as well as for molecular histology, i.e., to enable diagnosis of 
disease or disorders in pathology tissue samples such as biopsies. Still another use of 
this method is in monitoring the effects of development and aging upon the gene 

25 expression in a selected animal, by preparing surfaces bearing 
oligonucleotide/polynucleotides prepared from samples of standardized younger 
members of the species being tested. Additionally the patient can serve as an internal 
control by virtue of having the method applied to blood samples every 5-10 years 
during his lifetime. 

30 Still another intriguing use of this method is in the area of 

monitoring the effects of drugs on gene expression, both in laboratories and during 
clinical trials with animal, especially humans. Because the method can be readily 
adapted by altering the above parameters, it can essentially be employed to identify 
differentially expressed genes of any organism, at any stage of development, and 

35 under the influence of any factor which can affect gene expression. 
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TV, The Genes and Proteins Identified 

Application of the compositions and methods of this invention as 
above described also provide other compositions, such as any isolated gene sequence 
which is differentially expressed between a normal healthy animal and an animal 
5 having a disease or infection. Another embodiment of this invention is any isolated 
pathogen gene sequence which is expressed in tissue or cell samples of an infected 
animal. Similarly an embodiment of this invention is any gene sequence identified by 
the methods described herein. 

These gene sequences may be employed in conventional methods to 

10 produce isolated proteins encoded thereby. To produce a protein of this invention, 
the DNA sequences of a desired gene identified by the use of the methods of this 
invention or portions thereof are insened into a suitable expression system. 
Desirably, a recombinant molecule or vector is constructed in which the 
polynucleotide sequence encoding the protein is operably linked to a heterologous 

15 expression control sequence permitting expression of the human protein. Numerous 
types of appropriate expression vectors and host cell systems are known in the art for 
mammalian (including humaii) expression, insect, e.g., baculovirus expression, yeast, 
fimgal, and bacterial expression, by standard molecular biology techniques. 

The transfection of these vectors into appropriate host cells, whether 

20 mammalian, bacterial, fungal, or insect, or into appropriate viruses, can result in 
expression of the selected proteins. Suitable host cells or cell lines for transfection, 
and viruses, as well as methods for the construction and transfection of such host cells 
and viruses are well-known. Suitable methods for transfection, culture, amplification, 
screening, and product production and purification are also known in the art. 

25 The genes and proteins identified by this invention can be employed, if 

desired in diagnostic compositions useful for the diagnosis of a disease or infection 
using conventional diagnostic assays. For example, a diagnostic reagent can be 
developed which detectably targets a gene sequence or protein of this invention in a 
biological sample of an animal. Such a reagent may be a complementary nucleotide 

30 sequence, an antibody (monoclonal, recombinant or polyclonal), or a chemically 
derived agonist or antagonist Alternatively, the proteins and polynucleotide 
sequences of this invention, fragments of same, or complementary sequences thereto, 
may themselves be useful as diagnostic reagents for diagnosing disease states with 
which the ESTs of the invention are associated. These reagents may optionally be 

35 labelled using diagnostic labels, such as radioactive labels, colorimetric enzyme label 
systems and the like conventionally used in diagnostic or therapeutic methods, e.g. 
Northern and Western blotting, antigen-antibody binding and the like. The selection 
of the appropriate assay format and label system is within the skill of the art and may 
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readily be chosen without requiring additional explanation by resort to the wealth of 
art in the diagnostic area. 

Additionally, genes and proteins identified according to this invention 
may be used therapeutically. For example, the EST-containing gene sequences may 

5 be useful in gene therapy, to provide a gene sequence which in a disease is not 
properly or sufficientiy expressed. In such a method, a selected gene sequence of this 
invention is introduced into a suitable vector or other delivery system for delivery to a • 
cell containing a defect in the selected gene. Suitable delivery systems are well 
known to those of skill in the art and enable the desired EST or gene to be 

10 incorporated into the target cell and to be translated by the cell. The EST or gene 
sequence may be introduced to mutate the existing gene by recombination or provide 
an active copy thereof in addition to the inactive gene to replace its function. 

Alternatively, a protein encoded by an EST or gene of the invention 
may be useful as a therapeutic reagent for delivery of a biologically active protein, 

15 particularly when the disease state is associated with a deficiency of this protein. 
Such a protein may be incorporated into an appropriate therapeutic formulation, alone 
or in combination with other active ingredients. Methods of formulating such 
therapeutic compositions, as well as suitable pharmaceutical carriers, and the like, are 
well known to those of skill in the art. Still an additional method of delivering the 

20 niissing protein encoded by an EST, or the gene from which a selected EST was 
derived, involves expressing it directiy in vivo. Systems for such in vivo expression 
are well known in the art. 

Yet another use of the ESTs, genes identified according to the methods 
of this invention, or the proteins encoded thereby is a target for the screening and 

25 development of natural or synthetic chemical compounds which have utility as 
therapeutic drugs for the treatment of disease states associated with the identified 
genes and ESTs derived therefrom. As one example, a compound capable of binding 
to such a protein encoded by such a gene and either preventing or enhancing its 
biological activity may be a useful drug component for the treatment or prevention of 

30 such disease states. 

Conventional assays and techniques may be used for the screening and 
development of such drugs. As one example, a method for identifying compounds 
which specifically bind to or inhibit or activate proteins encoded by these gene 
sequences can include simply the steps of contacting a selected protein or gene 

35 product, with a test compound to permit binding of the test compound to the protein; 
and determining the amount of test compound, if any, which is bound to the protein. 
Such a method may involve the incubation of die test compound and the protein 
immobilized on a solid support. Still other conventional methods of drug screening 
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can involve employing a suitable computer program to determine compounds having 
similar or complementary chemical structures to that of the gene product or portions 
thereof and screening those compounds either for competitive binding to the protein 
to detect enhanced or decreased activity in the presence of the selected compound. 
S Thus> through use of such methods, the present invention is anticipated 

to provide compounds capable of interacting with these genes, ESTs, or encoded 
^ proteins, or fragments thereof, and either enhancing or decreasing the biological 

activity, as desired. Such compounds are believed to be encompassed by this 
invention. 

10 Numerous modifications and variations of the present invention are 

included in the above-identified specification and are expected to be obvious to one of 
skill in the art. Such modifications and alterations to the compositions and processes 
of the present invention are believed to be encompassed in the scope of the claims 
appended hereto. 

15 
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WHAT IS CLAIMED IS: 

1. A method for identifying genes which are differentially expressed in 
two different pre-deteraiined states of an organism comprising: 
5 a* providing a first surface on which is immobilized at pre-defined 

regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected ftom the group consisting of a fragment of an EST, 
an entire EST a fiagment of a gene or an entire gene, isolated fix)m a DNA library 
prepared from at least one selected cell, tissue, organ or organism sample in a first 
10 state and present in excess relative to the polynucleotide to be hybridized; 

b. providing a second surface on which is immobilized at pre-defined 
regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected ftom the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene, isolated from a DNA library 

15 prepared from at least one selected cell, tissue, organ or organism sample in a second 
state and present in excess relative to the polynucleotide to be hybridized; 

c. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a sample from a said organism in said first 
state, said sample selected from sources analogous to the sources of step (a), said 

20 hybridization sufficient to form a first and second hybridization pattern on each said 
first and second surface, 

d detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a sample from said organism in said second 
state, said sample selected fix)m sources analogous to the sources of step (c), said 

25 hybridization sufficient to form a third and fourth hybridization pattern on each said 
first and second surface, 

e. comparing at least two of the four hybridization patterns, 
wherein genes differentially expressed in said first and second states are identified by 
the presence of differences in the hybridization patterns at pre-defined regions; 

30 f. identifying the oligonucleotide/polynucleotides on each surface 

which correspond to said pattern differences and the corresponding ESTs or larger 
gene fragment from which the oligonucleotide/polynucleotides were obtained, 
whereby identification of the EST or larger gene fragment permits identification of 
the gene from which the ESTs or larger gene fragment were derived. 

35 
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2. The method according to Claim 1 wherein said first and second states are 
respectively healthy and disease; pathogen uninfected and pathogen infected; a first 
progression state and a second progression of a disease or infection; a first treatment 
state and a second treatment state of a disease or infection; or a first developmental 

5 and a second developmental state, 

3, The method according to Claim 1 wherein said organism is a plant or an 

animal. 

10 4. The method according to Claim 3 wherein said aniaml is a himian. 

5. A method for identifying genes which are differentially expressed in a 
normal healthy animal and an animal having a disease comprising: 

a. providing a first surface on which is inmiobilized at pre- 
15 defined regions on said surface a plurality of defined oligonucleotide/polynucleotide 

sequences, each sequence each sequence selected from the group consisting of a 
fragment of an EST, an entire EST a fragment of a gene or an entire gene, isolated 
fipom a DNA library prepared fixim at least one selected cell, tissue, organ or organism 
sample in a healthy animal and present in excess relative to the polynucleotide to be 
20 hybridized; 

b. providing a second surface on which is immobilized at pre- 
defined regions of said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence each sequence selected from the group consisting of a 
fragment of an EST, an entire EST a fragment of a gene or an entire gene, isolated 

25 from a DNA library prepared from at least one selected cell, tissue, organ or organism 
sample from an animal having said disease and present in excess relative to the 
polynucleotide to be hybridized; 

c. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 

30 selected from sources analogous to the sources of step (a), said hybridization 
sufficient to form a first and second hybridization pattern on each said first and 
second surface, said sample selected from a cell or tissue sample analogous to the 
sample of step (a), said hybridization sufficient to form a first and second 
hybridization pattern on each said first and second surface; 
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d. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a sample from an animal having said disease, 
said sample selected from a cell or tissue sample analogous to the sample of step (c), 
said hybridization sufficient to form a third and fourth hybridization pattern on each 

5 said first and second surface, 

e. comparing at least two of the four hybridization patterns, 
wherein genes differentially expressed in said first and second states are identified by 
the presence of differences in the hybridization patterns at prerdefined regions; 

f . identifying the oligonucleotide/polynucleotides on each surface 
10 which correspond to said pattern differences and the corresponding ESTs or larger 

gene fragment from which the oligonucleotide^lynucleotides were obtained, 
whereby identification of the EST or larger gene fragment permits identification of 
the gene from which the ESTs or larger gene fragment were derived. 

15 6. A method for identifying genes which are differentially expressed in a 

normal healthy animal and an animal having a disease comprising: 

a. providing a surface on which is immobilized at pre-defined 
regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 

20 an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared from the group seleaed from at least one selected cell, tissue, organ or 
organism saniple in of a healthy animal and an analogous selected sample of an 
animal having said disease and both present in excess relative to the polynucleotide to 
be hybridized; 

25 b. detectably hybridizing to a first copy of said surface 

polynucleotide sequences isolated from a healthy animal, said sample selected from a 
cell or tissue sample analogous to the sample of step (a), said hybridization sufficient 
to form a first hybridization pattern on said surface; 

c. detectably hybridizing to a second copy of said surface 
30 polynucleotide sequences isolated from an animal having said disease, said sample 

selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a second hybridization pattern on said surface; 

d. comparing the two hybridization patterns, wherein genes 
differentially expressed in a disease state are identified by the presence of differences 

35 in tiie hybridization patterns at pre-defined regions; 
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e. identifying the oligonucleotide/polynucleotides on each surface 
which correspond to said pattern differences and the corresponding ESTs from which 
the oligonucleotide/polynucleotides are obtained, whereby identification of the EST 
permits identification of the gene from which the ESTs were derived. 

5 

7. A method for identifying a gene of a pathogen which is expressed in a 
biological sample of an animal infected with said pathogen comprising: 

a. providing a first surface on which is immobilized at pre- 
defined regions on said surface a plurality of defined oligonucleotide/polynucleotide 
10 sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared from at least one selected cell, tissue, organ or organism sample of a 
healthy, uninfected animal and present in excess relative to the polynucleotide to be 
hybridized; 

15 b. providing a second surface on which is immobilized at pre- 

defined regions of said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated from at least one 
selected cell, tissue, organ or organism sample of an infected animal; 

20 c. detectably hybridizing to a set of said first and second surfaces 

polynucleotide sequences isolated from a sample from a healthy animal, said sample 
selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form first and second hybridization patterns on each said 
first and second surface, 

25 d. detectably hybridizing to a set of said first and second surfaces 

polynucleotide sequences isolated from a sample from an infected animal, said 
sample selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form third and fourth hybridization patterns on each said 
first and second surface, 

30 e. comparing the four hybridization patterns, wherein genes of 

said pathogen which are expressed in an infected animal are identified by the 
presence of differences in the hybridization patterns at pre-defined regions; 

f. identifying the oligonucleotide/polynucleotides on each surface 
which correspond to said pattem differences and the corresponding ESTs from which 

35 the oligonucleotide/polynucleotides are obtained, whereby identification of the EST 
permits identification of the gene from which the ESTs were derived. 
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8. A method for identifying a gene of a pathogen which is expressed in a 
biological sample of an animal infected with said pathogen comprising: 

a. providing a surface on which is immobilized at pre-defined 
regions on said smface a plurality of defined oligonucleotide/polynucleotide 

5 sequences, each sequence selected from die group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared from the group selected from at least one selected cell, tissue, organ or 
organism sample in of a healthy animal and an analogous selected sample of an 
animal having said disease and both present in excess relative to the polynucleotide to 
10 be hybridized 

b, detectably hybridizing to a first copy of said surface 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 
selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a first hybridization pattern on said surface; 

15 c. detectably hybridizing to a second copy of said surface 

polynucleotide sequences isolated from a sample from an infected animal, said 
sample selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a second hybridization pattern on said surface; 

d, comparing the two hybridization patterns, wherein genes of 
20 said pathogen which are expressed in an infected animal are identified by the 

presence of differences in the hybridization patterns at pre-defined regions; 

e. identifying the oligonucleotide/polynucleotides on each surface 
which correspond to said pattern differences and the corresponding ESTs from which 
the oligonucleotide/polynucleotides are obtained, whereby identification of the EST 

25 permits identification of the gene from which the ESTs were derived. 

9, A composition suitable for use in hybridization comprising a solid 
surface on which is immobilized at pre-defined regions on said surface a plurality of 
defined oligonucleotide/polynucleotide sequences for hybridization, each sequence 

30 selected from the group consisting of a fragment of an EST, an entire EST a fragment 
of a gene or an entire gene isolated from a DNA library prepared from the group 
selected from at least one selected cell, tissue, organ or organism sample of a healthy 
animal, at least one analogous sample of said animal having a disease, at least one 
analogous sample of said animal infected with a microbial pathogen, and any 

35 combination thereof. 
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10. An isolated gene sequence which is differentially expressed in a 
normal healthy animal and an animal having a disease, identified by the method of 
claim 1. 

5 11. An isolated pathogen gene sequence which is expressed in tissue or 

cell samples of an infected animal identified by the method of claim 7. 

12. A diagnostic composition usefid for the diagnosis of a disease 
comprising a reagent capable of detectably targeting a gene sequence of claim 10 in a 

10 biological sample of an animal. 

13. A diagnostic composition useful for the diagnosis of infection by a 
pathogen comprising a* reagent capable of detectably targeting a gene sequence of 
claim 1 1 in a biological sample of an animal. 

15 

14. An isolated protein produced by expression of a gene sequence of 
claim 10. 

15. An isolated pathogen protein produced by expression of a gene 
20 sequence of claim 11. 

16. A therapeutic composition comprising a protein or fragment thereof 
selected from the group consisting of a protein of claim 10 and a protein of claim 15. 

25 17. A method for diagnosing a selected disease or infection in an animal 

comprising: 

a. providing a first surface on which is immobilized at pre- 
defined regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 

30 an entire EST a fragment of a gene or an entire gene, isolated from a DNA library 
prepared from at least one selected cell, tissue, organ or organism sample of a healthy 
animal and present in excess relative to the polynucleotide to be hybridized; 

b. providing a second surface on which is immobilized at pre- 
defined regions of said surface a plurality of defined oligonucleotide^lynucleotide 

35 sequences, each sequence comprising a fragment of an EST isolated from at least one 
said tissue of an animal having said disease; 
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c. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a DNA library prepared from a sample from a 
healthy animal, said sample selected from a cell or tissue sample analogous to the 
sample of step (a), said hybridization sufficient to form a first and second 

5 hybridization pattern on each said first and second surface; 

d. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a DNA library prepared from a sample fi-om 
an animal having said disease, said sample selected from a cell or tissue sample 
analogous to the sample of step (c), said hybridization sufficient to form a third and 

10 fourth hybridization pattern on each said first and second surface; 

c. comparing the four hybridization patterns, wherein substantial 
differences between the first and third hybridization patterns and the second and 
fourth hybridization patterns indicates the presence of said selected disease or 
infection in said animal, and substantial similarities in said first and third 

15 hybridization patterns and second and fourth hybridization patterns indicates the 
absence of disease or infection. 

18. A method for diagnosing a selected disease or infection in an animal 
comprising: 

20 a. providing a surface on which is immobilized at pre-defined 

legions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence comprising a fragment of an EST isolated from a DNA 
library prepared from the group consisting of a selected cell or tissue sample of a 
healthy animal and an analogous selected cell or tissue sample of an animal having 

25 said disease; 

b. detectably hybridizing to a first copy of said surface 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 
selected firom a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a first hybridization pattern on said surface; 

30 c. detectably hybridizing to a second copy of said surface 

polynucleotide sequences isolated from a DNA library prepared from a sample from 
an animal having said disease, said sample selected from a cell or tissue sample 
analogous to the sample of step (a), said hybridization sufficient to form a second 
hybridization pattern on said surface; 

35 d. comparing the two hybridization patterns, wherein substantial 

differences between the first and second hybridization patterns indicates the presence 
of said selected disease or infection in said animal, and substantial similarities in said 
first and second hybridization patterns indicates the absence of disease or infection. 
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COMPARATIVE GENE TRANSCRIPT l^ALYSIS 

1. FIELD OF INVENTION 

The present invention is in the field of molecular 
biology and computer science; more particularly, the 
5 present invention describes methods of analyzing gene 

transcripts and diagnosing the genetic expression of cells 
and tissue. 

2. BACKGROUND OF THE INVENTION 

Until very recently, the history of molecular biology 

10 has been written one gene at a time. Scientists have 
observed the cell's physical changes, isolated mixtures 
from the cell or its milieu, purified proteins, sequenced 
proteins and therefrom constructed probes to look for the 
. corresponding gene. 

15 Recently, different nations have set up massive 

projects to sequence the billions of bases in the human 
genome. These projects typically begin with dividing the 
genome into large portions of chromosomes and then 
determining the sequences of these pieces, which are then 

20 analyzed for identity with known proteins or portions 

thereof, known as motifs. Unfortunately, the majority of 
genomic DNA does not encode proteins and though it is 
postulated to have some effect on the cell's ability to 
make protein, its relevance to medical applications is not 

25 understood at this time. 

A third methodology involves sequencing only the 
transcripts encoding the cellular machinery actively 
involved in making protein, namely the mRNA. The advantage 
is that the cell has already edited out all the non-coding 

30 DNA, and it is relatively easy to identify the protein- 
coding portion of the RNA. The utility of this approach 
was not immediately obvious to genomic researchers. In 
fact, when cDNA sequencing was initially proposed, the 
method was roundly denounced by those committed to genomic 

35 sequencing. For example, the head of the U.S. Human Genome 
project discounted CDNA sequencing as not valuable and 
refused to approve funding of projects. 

In this disclosure, we teach methods for analyzing 
DNA, including cDNA libraries. Based on our analyses and 
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research, we see each individual gene product as a "pixel" 
of information, which relates to the expression of that, 
and only that, gene. We teach herein, methods whereby the 
individual "pixels" of gene expression information can be 
5 combined into a single gene transcript "image," in which 
each of the individual genes can be visualized 
simultaneously and allowing relationships between the gene 
pixels to be easily visualized and understood. 

We further teach a new method which we call electronic 
10 subtraction. Electronic subtraction will enable the gene 
researcher to turn a single image into a moving picture, 
one which describes the temporality or dynamics of gene 
expression, at the level of a cell or a whole tissue. It 
is that sense of "motion" of cellular machinery on the 
15 scale of a cell or organ which constitutes the new 

invention herein. This constitutes a new view into the 
process of living cell physiology and one which holds great 
promise to unveil and discover new therapeutic and 
diagnostic approaches in medicine. 
20 We teach another method which we call "electronic 

northern," which tracks the expression of a single gene 
across many types of cells and tissues. 

Nucleic acids (DNA and RNA) carry within their 
sequence the hereditary information and are therefore the 
25 prime molecules of life. Nucleic acids are found in all 

living organisms including bacteria, fungi, viruses, plants 
and animals. It is of interest to determine the relative 
abundance of different discrete nucleic acids in different 
cells, tissues and organisms over time under various 
30 conditions, treatments and regimes. 

All dividing cells in the human body contain the same 
set of 23 pairs of chromosomes. It is estimated that these 
autosomal and sex chromosomes encode approximately 100,000 
genes. The differences among different types of cells are 
35 believed to reflect the differential expression of the 
100,000 or so genes. Fundamental questions of biology 
could be answered by understanding which genes are 
transcribed and knowing the relative abundance of 
transcripts in different cells. 
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Previously, the art has only provided for the analysis 
of a few known genes at a time by standard molecular 
biology techniques such as PGR, northern blot analysis, or 
other types of DNA probe analysis such as in situ 
5 hybridization. Each of these methods allows one to analyze 
the transcription of only known genes and/or small numbers 
of genes at a time. Nucl. Acids Res. 19., 7097-7104 (1991); 
Nucl. Acids Res. 18, 4833-42 (1990); Nucl. Acids Res. 18, 
2789-92 (1989) ; European J. Neuroscience 2, 1063-1073 

10 (1990); Analytical Biochem. 187, 364-73 (1990); Genet. 
Annals Techn. Appl. 7, 64-70 (1990); GATA 8 (4) , 129-33 
(1991); Proc. Natl. Acad. Sci. USA 85/ 1696-1700 (1988);- 
Nucl. Acids Res. 19, 1954 (1991); Proc. Natl. Acad. Sci. 
USA Mr 1943-47 (1991); Nucl. Acids Res. 19, 6123-27 

15 (1991); Proc. Natl. Acad. Sci. USA 85/ 5738-42 (1988); 
Nucl. Acids Res. 16, 10937 (1988). 

Studies of the number and types of genes whose 
transcription is induced or otherwise regulated during cell 
processes such as activation, differentiation, aging, viral 

20 transformation, morphogenesis, and mitosis have been 

pursued for many years, using a variety of methodologies. 
One of the earliest methods was to isolate and analyze 
levels of the proteins in a cell, tissue, organ system, or 
even organisms both before and after the process of 

25 interest. One method of analyzing multiple proteins in a 
sample is using 2-dimensional gel electrophoresis, wherein 
proteins can be, in principle, identified and quantified as 
individual bands, and ultimately reduced to a discrete 
signal. At present, 2-dimensional analysis only resolves 

30 approximately 15% of the proteins. In order to positively 
analyze those bands which are resolved, each band must be 
excised from the membrane and subjected to protein sequence 
analysis using Edman degradation. Unfortunately, most of 
the bands were present in quantities too small to obtain a 

35 reliable sequence, and many of those bands contained more 
than one discrete protein. An additional difficulty is 
that many of the proteins were blocked at the 
amino-terminus , further complicating the sequencing 
process. 
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Analyzing differentiation at the gene transcription 
level has overcome many of these disadvantages and 
drawbacks, since the power of recombinant DNA technology 
allows amplification of signals containing very small 
5 amounts of material. The most common method, called 
"hybridization subtraction," involves isolation of mRNA 
from the biological specimen before (B) and after (A) the 
developmental process of interest, transcribing one set of 
mRNA into cDNA, subtracting specimen B from specimen A 
10 (mRNA from cDNA) by hybridization, and constructing a cDNA 
library from the non-hybridizing mRNA fraction. Many 
different groups have used this strategy successfully, and 
a variety of procedures have been published and improved 
upon using this same basic scheme, Nucl, Acids Res. 19, 
15 7097-7104 (1991); Nucl. Acids Res. 18, 4833-42 (1990); 
• Nucl. Acids Res. 18, 2789-92 (1989); European J. 
Neuroscience 2, 1063-1073 (1990); Analytical Biochem. 187 , 
364-73 (1990); Genet. Annals Techn. Appl. 7, 64-70 (1990); 
GATA 8(4), 129-33 (1991); Proc. Natl. Acad. Sci. USA 85. 
20 1696-1700 (1988); Nucl. Acids Res. 19, 1954 (1991); Proc, 
Natl. Acad. Sci. USA 88/ 1943-47 (1991); Nucl. Acids Res. 
19, 6123-27 (199i) ; Proc. Natl. Acad. Sci. USA 85/ 5738-42 
(1988); Nucl- Acids Res. 16, 10937 (1988). 

Although each of these techniques have particular 
25 strengths and weaknesses, there are still some limitations 
and undesirable aspects of these methods: First, the time 
and effort required to construct such libraries is quite 
large. Typically, a trained molecular biologist might 
expect construction and characterization of such a library 
30 to require 3 to 6 months, depending on the level of skill, 
experience, and luck. Second, the resulting subtraction 
libraries are typically inferior to the libraries 
constructed by standard methodology. A typical 
conventional cDNA library should have a clone complexity of 
35 at least 10^ clones, and an average insert size of 1-3 kB. 
In contrast, subtracted libraries can have complexities of 
10^ or 10^ and average insert sizes of 0.2 kB. Therefore, 
there can be a significant loss of clone and sequence 
information associated with such libraries. Third, this 
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approach allows the researcher to capture only the genes 
induced in specimen A relative to specimen B, not 
vice-versa, nor does it easily allow comparison to a third 
specimen of interest (C) . Fourth, this approach requires 
5 very large amounts (hundreds of micrograms) of "driver" 
mRNA (specimen B) , which significantly limits the number 
and type of subtractions that are possible since many 
tissues and cells are very difficult to obtain in large 
quantities • 

10 Fifth, the resolution of the subtraction is dependent 

upon the physical properties of DNA:DNA or RNA:DNA 
hybridization. The ability of a given sequence to find a 
hybridization match is dependent on its unique CoT value. 
The CoT value is a function of the number of copies 

15 (concentration) of the particular sequence, multiplied by 
the time of hybridizjation. It follows that for sequences 
which are abundant, hybridization events will occur very 
rapidly (low CoT value) , while rare sequences will form 
duplexes at very high CoT values. CoT values which allow 

20 such rare sequences to form duplexes and therefore be 
effectively selected are difficult to achieve in a 
convenient time frame. Therefore, hybridization 
subtraction is simply not a useful technique with which to 
study relative levels of rare mRNA species. Sixth, this 

25 problem is further complicated by the fact that duplex 
formation is also dependent on the nucleotide base 
composition for a given sequence. Those sequences rich in 
G + C form stronger duplexes than those with high contents 
of A + T. Therefore, the former sequences will tend to be 

30 removed selectively by hybridization subtraction. Seventh, 
it is possible that hybridization between nonexact matches 
can occur. When this happens, the expression of a 
homologous gene may "mask" expression of a gene of 
interest, artificially skewing the results for that 

35 particular gene. 

Matsubara and Okubo proposed using partial cDNA 
sequences to establish expression profiles of genes which 
could be used in functional analyses of the human genome. 
Matsubara and Okubo warned against using random priming, as 
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it creates multiple unique DNA fragments from individual 
mRNAs and may thus skew the analysis of the number of 
particular mRNAs per library. They sequenced randomly 
selected members from a 3 '-directed cDNA library and 
5 established the . frequency of appearance of the various 
ESTs, They proposed comparing lists of ESTs from various 
cell types to classify genes- Genes expressed in many 
different cell types were labeled housekeepers and those 
selectively expressed in certain cells were labeled cell- 

10 specific genes, even in the absence of the full sequence of 
the gene or the biological activity of the gene product. 

The present invention avoids the drawbacks of the 
prior art by providing a method to quantify the relative 
abundance of multiple gene transcripts in a given 

15 biological specimen by the use of high-throughput 

sequence-specific analysis of individual RNAs and/or their 
corresponding cDNAs, 

The present invention offers several advantages over 
current protein discovery methods which attempt to isolate 

20 individual proteins based upon biological effects. The 
method of the instant invention provides for detailed 
diagnostic comparisons of cell profiles revealing numerous 
changes in the expression of individual transcripts. 

The instant invention provides several advantages over 

25 current subtraction methods including a more complex 
library analysis (lo*^ to 10^ clones as compared to 10^ 
clones) which allows identification of low abundance 
messages as well as enabling the identification of messages 
which either increase or decrease in abundance. These 

30 large libraries are very routine to make in contrast to the 
libraries of previous methods. In addition, homologues can 
easily be distinguished with the method of the instant 
invention. 

This method is very convenient because it organizes a 
35 large quantity of data into a comprehensible, digestible 
format. The most significant differences are highlighted 
by electronic subtraction. In depth analyses are made more 
convenient. 



6 



wo 95/20681 PCT/DS95/01160 

The present invention provides several advantages over 
previous methods of electronic analysis of cDNA. The 
method is particularly powerful when more than 100 and 
preferably more than 1,000 gene transcripts are analyzed. 
5 In such a case, new low-frequency transcripts are 
discovered and tissue typed- 

High resolution analysis of gene expression can be 
used directly as a diagnostic profile or to identify 
disease-specific genes for the development of more classic 
10 diagnostic approaches. 

This process is defined as gene transcript frequency 
analysis. The resulting quantitative analysis of the gene 
transcripts is defined as comparative gene transcript 
analysis. 

15 3. SUMMARY OF THE INVENTION 

The invention is a method of analyzing a specimen 
containing gene transcripts comprising the steps of (a) 
producing a library of biological sequences; (b) generating 
a set of transcript sequences, where each of the transcript 

20 sequences in said set is indicative of a different one of 
the biological sequences of the library; (c) processing the 
transcript sequences in a programmed computer (in which a 
database of reference transcript sequences indicative of 
reference sequences is stored) , to generate an identified 

25 sequence value for each of the transcript sequences, where 
each said identified sequence value is indicative of 
sequence annotation and a degree of match between one of 
the biological sequences of the library and at least one of 
the reference sequences; and (d) processing each said 

30 identified sequence value to generate final data values. 

indicative of the number of times each identified sequence 
value is present in the library. 

The invention also includes a method of comparing two 
specimens containing gene transcripts. The first specimen 

35 is processed as described above. The second specimen is 
used to produce a second library of biological sequences, 
which is used to generate a second set of transcript 
sequences, where each of the transcript sequences in the 
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second set is indicative of one of the biological sequences 
of the second library. Then the second set of transcript 
sequences is processed in a programmed computer to generate 
a second set of identified sequence values, namely the 
5 further identified sequence values, each of which is 

indicative of a sequence annotation and includes a degree 
of match between one of the biological sequences of the 
second library and at least one of the reference sequences. 
The further identified sequence values are processed to 
10 generate further final data values indicative of the number 
of times each further identified sequence value is present 
in the second library. The final data' values from the 
first specimen and the further identified sequence values 
from the second specimen are processed to generate ratios 
15 of transcript sequences, which indicate the differences in 
the number of gene transcripts, between the two specimens. 

In a further embodiment, the method includes 
quantifying the relative abundance of mRNA in a biological 
specimen by (a) isolating a population of mRNA transcripts 
20 from a biological specimen; (b) identifying genes from 
which the mRNA was transcribed by a sequence-specific 
method; (c) determining the numbers of mRNA transcripts 
corresponding to each of the genes; and (d) using the mRNA 
. transcript numbers to determine the relative abundance of 
25 mRNA transcripts within the population of mRNA transcripts. 
Also disclosed is a method of producing a gene 
transcript image analysis by first obtaining a mixture of 
mRNA, from which cDNA copies are made. The cDNA is 
inserted into a suitable vector which is used to transfect 
30 suitable host strain cells which are plated out and 

permitted to grow into clones, each cone representing a 
unique mRNA. A representative population of clones 
transfected with cDNA is isolated. Each clone in the 
population is identified by a sequence-specific method 
35 which identifies the gene from which the unique mRNA was 
transcribed. The number of times each gene is identified 
to a clone is determined to evaluate gene transcript 
abundance. The genes and their abundances are listed in 
order of abundance to produce a gene transcript image. 
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In a further embodiment, the relative abundance of the 
gene transcripts in one cell type or tissue is compared 
with the relative abundance of gene transcript numbers in a 
second cell type or tissue in order to identify the 
5 differences and similarities. 

In a further embodiment, the method includes a system 
for analyzing a library of biological sequences including a 
means for receiving a set of transcript sequences, where 
each of the transcript sequences is indicative of a 

10 different one of the biological sequences of the library; 
and a means for processing the transcript sequences in a 
computer system in which a database of reference transcript 
sequences indicative of reference sequences is stored, 
wherein the computer is programmed with software for 

15 generating an identified sequence value for each of the 
transcript sequences, where each said identified sequence 
value is indicative of a sequence annotation and the degree 
of match between a different one of the biological 
sequences of the library and at least one of the reference 

20 sequences, and for processing each said identified sequence 
value to generate final data values indicative of the. 
number of times each identified sequence value is present 
in the library. 

In essence, the invention is a method and system for 

25 quantifying the relative abundance of gene transcripts in a 
biological specimen. The invention provides a method for 
comparing the gene transcript image from two or more 
different biological specimens in order to distinguish 
between the two specimens and identify one or more genes 

30 which are differentially expressed between the two 
specimens. Thus, this gene transcript image and its 
comparison can be used as a diagnostic. One embodiment of 
the method generates high-throughput sequence-specific 
analysis of multiple RNAs or their corresponding cDNAs: a 

35 gene transcript image. Another embodiment of the method 

produces the gene transcript imaging analysis by the use of 
high-throughput cDNA sequence analysis. In addition, two 
or more gene transcript images can be compared and used to 
detect or diagnose a particular biological state, disease, 
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or condition which is correlated to the relative abundance 
of gene transcripts in a given cell or population of cells. 

4. DESCRIPTION OF THE TABLES TUTO DRAWINGS 
4.1. TABLES 

5 Table 1 presents a detailed explanation of the letter 

codes utilized in Tables 2-5. 

Table 2 lists the one hundred most common gene 
transcripts. It is a partial list of isolates from the 
HUVEC cDNA library prepared and sequenced as described 

10 below. The left-hand column refers to the sequence's order 
of abundance in this table. The next column labeled 
"number" is the clone number of the first HUVEC sequence 
identification reference matching the sequence in the 
"entry" column number. Isolates that have not been 

15 sequenced are not present in Table 2. The next column, 

labeled "N", indicates the total number of cDNAs which have 
the same degree of match with the sequence of the reference 
transcript in the "entry" column. 

The column labeled "entry" gives the NIH GENBANK locus 

20 name, which corresponds to the library sequence numbers. 
The "s" column indicates in a few cases the species of the 
reference sequence. The code for column "s" is given in 
Table 1. The column labeled "descriptor" provides a plain 
English explanation of the identity of the sequence 

25 corresponding to the NIH GENBANK locus name in the "entry" 
column. 

Table 3 is a comparison of the top fifteen most 
abundant gene transcripts in normal monocytes and activated 
macrophage cells. 

30 Table 4 is a detailed summary of library subtraction 

analysis summary comparing the THP-l and human macrophage 
cDNA sequences. In Table 4, the same code as in Table 2 is 
used. Additional columns are for "bgfreq" (abundance 
number in the subtractant library), "rfend" (abundance 

35 number in the target library) and "ratio" (the target 
abundance number divided by the subtractant abundance 
number) . As is clear from perusal of the table, when the 
abundance number in the subtractant library is "0", the 
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target abundance number is divided by 0.05. This is a way 
of obtaining a result (not possible dividing by 0) and 
distinguishing the result from ratios of subtractant 
numbers of 1. 

5 Table 5 is the computer program, written in source 

code, for generating gene transcript subtraction profiles. 

Table 6 is a partial listing of database entries used 
in the electronic northern blot analysis as provided by the 
present invention. 

10 

4.2. BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 is a chart summarizing data collected and 
stored regarding the library construction portion of 
sequence preparation and analysis. 
15 Figure 2 is a diagram representing the sequence of 

operations performed by "abundance sort" software in a 
class of preferred embodiments of the inventive method. 

Figure 3 is a block diagram of a preferred embodiment 
of the system of the invention. 
20 Figure 4 is a more detailed block diagram of the 

bioinf ormatics process from new sequence (that has already 
been sequenced but not identified) to printout of the 
transcript imaging analysis and the provision of database 
subscriptions. 

25 5. DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides a method to compare the 
relative abundance of gene transcripts in different 
biological specimens by the use of high- throughput 
sequence-specific analysis of individual RNAs or their 

30 corresponding cDNAs (or alternatively, of data representing 
other biological sequences) . This process is denoted 
herein as gene transcript imaging. The quantitative 
analysis of the relative abundance for a set of gene 
transcripts is denoted herein as "gene transcript image 

35 analysis" or "gene transcript frequency analysis". The 
present invention allows one to obtain a profile for gene 
transcription in any given population of cells or tissue 
from any type of organism. The invention can be applied to 
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obtain a profile of a specimen consisting of a single cell 
(or clones of a single cell) , or of many cells, or of 
tissue more complex than a single cell and containing 
multiple cell types, such as liver, 
5 The invention has significant advantages in the fields 

of diagnostics, toxicology and pharmacology, to name a few. 
A highly sophisticated diagnostic test can be performed on 
the ill patient in whom a diagnosis has not been made. A 
biological specimen consisting of the patient's fluids or 

10 tissues is obtained, and the gene transcripts are isolated 
and expanded to the extent necessary to determine their 
identity. Optionally, the gene transcripts can be 
converted to cDNA, A sampling of the gene transcripts are 
subjected to sequence-specific analysis and quantified. 

15 These gene transcript sequence abundances are compared 
against reference database sequence abundances including 
normal data sets for diseased and healthy patients. The 
patient has the disease (s) with which the patient's data 
set most closely correlates. 

20 For example, gene transcript frequency analysis can be 

used to differentiate normal cells or tissues from diseased 
cells or tissues, just as it highlights differences between 
normal monocytes and activated macrophages in Table 3. 

In toxicology, a fundamental question is which tests 

25 are most effective in predicting or detecting a toxic 

effect. Gene transcript imaging provides highly detailed 
information on the cell and tissue environment, some of 
which would not be obvious in conventional, less detailed 
screening methods. The gene transcript image is a more 

30 powerful method to predict drug toxicity and efficacy. 
Similar benefits accrue in the use of this tool in 
pharmacology. The gene transcript image can be used 
selectively to look at protein categories which are 
expected to be affected, for example, enzymes which 

35 detoxify toxins. 

In an alternative embodiment, comparative gene 
transcript frequency analysis is used to differentiate 
betw en cancer cells which respond to anti-cancer agents 
and those which do not respond. Examples of anti-cancer 
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agents are tamoxifen, vincristine, vinblastine, 
podophy llotoxins , etoposide , tenisposide , cisplatin , 
biologic response modifiers such as interferon, 11-2, GM- 
CSF, enzymes, hormones and the like. This method also 
5 provides a means for sorting the gene transcripts by 
functional category. In the case of cancer cells, 
transcription factors or other essential regulatory 
molecules are very important categories to analyze across 
different libraries. 

10 In yet another embodiment, comparative gene transcript 

frequency analysis is used to differentiate between control 
liver cells and liver cells isolated from patients treated 
with experimental drugs like FIAU to distinguish between 
pathology caused by the underlying disease and that caused 

15 by the drug. 

In yet another embodiment, comparative gene transcript 
frequency analysis is used to differentiate between brain 
tissue from patients treated and untreated with lithium. 
In a further embodiment, comparative gene transcript 

20 frequency analysis is used to differentiate between 
cyclosporin and FK506-treated cells and normal cells. 

In a further embodiment, comparative gene transcript 
frequency analysis is used to differentiate between virally 
infected (including HIV-infected) human cells and 

25 uninfected human cells. Gene transcript frequency analysis 
is also used to rapidly survey gene transcripts in HIV- 
resistant, HIV-infected, and HIV-sensitive cells. 
Comparison of gene transcript abundance will indicate the 
success of treatment and/or new avenues to study. 

30 In a further embodiment, comparative gene transcript 

frequency analysis is used to differentiate between 
bronchial lavage fluids from healthy and unhealthy patients 
with a variety of ailments. 

In a further embodiment, comparative gene transcript 

35 frequency analysis is used to differentiate between cell, 
plant, microbial and animal mutants and wild-type species. 
In addition, the transcript abundance program is adapted to 
permit the scientist to evaluate the transcription of one 
gene in many different tissues. Such comparisons could 
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identify deletion mutants which, do not produce a gene 
product and point mutants which produce a less abundant or 
otherwise different message. Such mutations can affect 
basic biochemical and pharmacological processes, such as 
5 mineral nutrition and metabolism, and can be isolated by 
means known to those skilled in the art. Thus, crops with 
improved yields, pest resistance and other factors can be 
developed. 

In a further embodiment, comparative gene transcript 

10 frequency analysis is used for an interspecies comparative 
analysis which would allow for the selection of better 
pharmacologic animal models. In this embodiment, humans 
and other animals (such as a mouse) , or their cultured 
cells are treated with a specific test agent. The relative 

15 sequence abundance of each cDNA population is determined. 
' If the animal test system is a good model, homologous genes 
in the animal cDNA population should change expression 
similarly to those in human cells. If side effects are 
detected with the drug, a detailed transcript abundance 

20 analysis will be performed to survey gene transcript 

changes. Models will then be evaluated by comparing basic 
physiological changes . 

In a further embodiment, comparative gene transcript 
frequency analysis is used in a clinical setting to give a 

25 highly detailed gene transcript profile of a patient's 
cells or tissue (for example, a blood sample) . In 
particular, gene transcript frequency analysis is used to 
give a high resolution gene expression profile of a 
diseased state or condition. 

30 In the preferred embodiment, the method utilizes 

high-throughput cDNA sequencing to identify specific 
transcripts of interest. The generated cDNA and deduced 
amino acid sequences are then extensively compared with 
GENBANK and other sequence data banks as described below. 

35 The method offers several advantages over current protein 
discovery by two-dimensional gel methods which try to 
identify individual proteins involved in a particular 
biological effect. Here, detailed comparisons of profiles 
of activated and inactive cells reveal numerous changes in 
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the expression of individual transcripts. After it is 
determined if the sequence is an "exact" match, similar or 
a non-match, the sequence is entered into a database. 
Next, the numbers of copies of cDNA corresponding to each 
5 gene are tabulated. Although this can be done slowly and 
arduously, if at all, by human hand from a printout of all 
entries, a computer program is a useful and rapid way to 
tabulate this information. The numbers of cDNA copies 
(optionally divided by the total number of sequences in the 

10 data set) provides a picture of the relative abundance of 
transcripts for each corresponding gene. The list of 
represented genes can then be sorted by abundance in the 
cDNA population. A multitude of additional types of 
comparisons or dimensions are possible and are exemplified 

15 below. 

An alternate method of producing a gene transcript 
image includes the steps of obtaining a mixture of test 
mRNA and providing a representative array of unique probes 
whose sequences are complementary to at least some of the 

20 test mRNAs. Next, a fixed amount of the test mRNA is added 
to the arrayed probes. The test mRNA is incubated with the 
probes for a sufficient time to allow hybrids of the test 
mRNA and probes to form. The mRNA-probe hybrids are 
detected and the quantity determined. The hybrids are 

25 identified by their location in the probe array. The 
quantity of each hybrid is summed to give a population 
number. Each hybrid quantity is divided by the population 
number to provide a set of relative abundance data termed a 
gene transcript image analysis. 

30 6. EXAMPLES 

The examples below are provided to illustrate the 
subject invention. These examples are provided by way of 
illustration and are not included for the purpose of 
limiting the invention. 

35 6.1. TISSUE SOURCES AND CELL LINES 

For analysis with the computer program claimed herein, 
biological sequences can be obtained from virtually any 
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source. Most popular are tissues obtained from the human 
body. Tissues can be obtained from any organ of the body, 
any age donor, any abnormality or any immortalized cell 
line. Immortal cell lines may be preferred in some 
5 instances because of their purity of cell type; other 
tissue samples invariably include mixed cell types. A 
special technique is available to take a single cell (for 
example, a brain cell) and harness the cellular machinery 
to grow up sufficient cDNA for sequencing by the techniigues 

10 and analysis described herein (cf. U.S. Patent Nos. 
5, 021,335 and 5,168,038, which are incorporated by 
reference) . The examples given herein utilized the 
following immortalized cell lines: monocyte-like U-937 
cells, activated macrophage-like THP-1 cells, induced 

15 vascular endothelial cells (HUVEC cells) and mast cell-like 
HMC-1 cells. 

The U-937 cell line is a human histiocytic lymphoma 
cell line with monocyte characteristics, established from 
malignant cells obtained from the pleural effusion of a 

20 patient with diffuse histiocytic lymphoma (Sundstrom, C. 
and Nilsson, K. (1976) Int. J. Cancer 17:565). , U-937 is 
one of only a few human cell lines with the morphology, 
cytochemistry, surface receptors and monocyte-like 
characteristics of histiocytic cells. These cells can be 

25 induced to terminal monocytic differentiation and will 
express new cell surface molecules when activated with 
supernatants from human mixed lymphocyte cultures. Upon 
this type of in vitro activation, the cells undergo 
morphological and functional changes, including 

30 augmentation of antibody-dependent cellular cytotoxicity 

(ADCC) against erythroid and tumor target cells (one of the 
principal functions of macrophages) . Activation of U-937 
cells with phorbol 12-myristate 13-acetate (PMA) in vitro 
stimulates the production of several compounds, including 

35 prostaglandins, leukotrienes and platelet-activating factor 
(PAF) , which are potent inflammatory mediators. Thus, U- 
937 is a cell line that is well suited for the 
identification and isolation of gene transcripts associated 
with normal monocytes. 
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The HUVEC cell line is a normal, homogeneous, well 
characterized, early passage endothelial cell culture from 
human umbilical vein (Cell Systems Corp., 12815 NE 124th 
Street, Kirkland, WA 98 034) . Only gene transcripts from 
5 induced, or treated, HUVEC cells were sequenced. One batch 
of 1 X 10^ cells was treated for 5 hours with 1 U/ml rIL-lb 
and 100 ng/ml E.coli lipopolysaccharide (LPS) endotoxin 
prior to harvesting. A separate batch of 2 X 10^ cells was 
treated at confluence with 4 U/ml TNF and 2 U/ml 

10 interferon-gamma (IFN-gamma) prior to harvesting. 

THP-1 is a human leukemic cell line with distinct 
monocytic characteristics. This cell line was derived from 
the blood of a 1-year-old boy with acute monocytic leukemia 
(Tsuchiya, S. et al. (1980) Int. J. Cancer: 171-76). The 

15 following cytological and cytochemical criteria were used 
to determine the monocytic nature of the cell line: 1) the 
presence of alpha-naphthyl butyrate esterase activity which 
could be inhibited by sodium fluoride; 2) the production of 
lysozyme; 3) the phagocytosis of latex particles and 

20 sensitized SRBC (sheep red blood cells) ; and 4) the ability 
of mitomycin C-treated THP-1 cells to activate T- 
lymphocytes following ConA (concanavalin A) treatment. 
Morphologically, the cytoplasm contained small azurophilic 
granules and the nucleus was indented and irregularly 

25 shaped with .deep folds. The cell line had Fc and C3b 
receptors, probably functioning in phagocytosis. THP-1 
cells treated with the tumor promoter 12-o-tetradecanoyl- 
phorbol-13 acetate (TPA) stop proliferating and 
differentiate into macrophage-like cells which mimic native 

30 monocyte-derived macrophages in several respects. 

Morphologically, as the cells change shape, the nucleus 
becomes more irregular and additional. phagocytic vacuoles 
appear in the cytoplasm. The differentiated THP-1 cells 
also exhibit an increased adherence to tissue culture 

35 plastic. 

HMC-1 cells (a human mast cell line) were established 
from the peripheral blood of a Mayo Clinic patient with 
mast cell leukemia (Leukemia Res. (1988) 12:345-55). The 
cultured cells looked similar to immature cloned murine 



17 



wo 95/20681 PCT/US95/01160 

mast cells, contained histamine, and stained positively for 
chloroacetate esterase, amino caproate esterase, eosinophil 
major basic protein (MBP) and tryptase. The HMC-1 cells 
have, however, lost the ability to synthesize normal IgE 
5 receptors. HMC-1 cells also possess a 10, M6 translocation, 
present in cells initially collected by leukophoresis from 
the patient and not an artifact of culturing. Thus, HMC-1 
cells are a good model for mast cells. 

6.2. CONSTRUCTION OF cDNA LIBRARIES 

10 For inter-library comparisons, the libraries must be 

prepared in similar manners. Certain parameters appear to 
be particularly important to control. One such parameter 
is the method of isolating mRNA. It is important to use 
the same conditions to remove DNA and heterogeneous nuclear 

15 RNA from comparison libraries. Size fractionation of cDNA 
must be carefully controlled. The same vector preferably 
should be used for preparing libraries to be compared. At 
the very least, the same type of vector (e.g., 
unidirectional vector) should be used to assure a valid 

20 comparison. A unidirectional vector may be preferred in 
order to more easily analyze the output - 

It is preferred to prime only with oligo dT 
unidirectional primer in order to obtain one only clone per 
mRNA transcript when obtaining cDNAs. However, it is 

25 recognized that employing a mixture of oligo dT and random 
primers can also be advantageous because such a mixture 
results in more sequence diversity when gene discovery also 
is a goal. Similar effects can be obtained with DR2 
(Clontech) and HXLOX (US Biochemical) and also vectors from 

30 Invitrogen and Novagen. These vectors have two 

requirements. First, there must be primer sites for 
commercially available primers such as T3 or M13 reverse 
primers. Second, the vector must accept inserts up to 10 
kB. 

35 It also is important that the clones be randomly 

sampled, and that a significant population of clones is 
used. Data have been generated with 5,000 clones; however, 
if very rare genes are to be obtained and/or their relative 
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abundance determined, as many as 100,000 clones from a 
single library may need to be sampled. Size fractionation 
of cDNA also must be carefully controlled. Alternately, 
plaques can be selected, rather than clones. 
5 Besides the Uni-ZAP™ vector system by Stratagene 

disclosed below, it is now believed that other similarly 
unidirectional vectors also can be used. For example, it 
is believed that such vectors include but are not limited 
to DR2 (Clontech) , and HXLOX (U.S. Biochemical). 

10 Preferably, the details of library construction (as 

shown in Figure 1) are collected and stored in a database 
for later retrieval relative to the sequences being 
compared. Fig. 1 shows important information regarding the 
library collaborator, or cell or cDNA supplier, 

15 pretreatment , biological source, culture, mRNA preparation 
■ and cDNA construction. Similarly detailed information 
. about the other steps is beneficial in analyzing sequences 
and libraries in depth. 

RNA must be harvested from cells and tissue samples 

20 and cDNA libraries are subsequently constructed. cDNA 

libraries can be constructed according to techniques known 
in the art. (See, for example, Maniatis, T. et al. (1982) 
Molecular Cloning, Cold Spring Harbor Laboratory, New 
York) . cDNA libraries may also be purchased. The U-937 

25 cDNA library (catalog No. 937207) was obtained from 

Stratagene, Inc., 11099 M. Torrey Pines Rd., La. Jolla, CA 
92037. 

The THP-1 cDNA library was custom constructed by 
Stratagene from THP-1 cells cultured 4 8 hours with 100 nm 

30 TPA and 4 hours with 1 /ig/ml LPS. The human mast cell HMC- 
1 cDNA library was also custom constructed by Stratagene 
from cultured HMC-1 cells. The HUVEC cDNA library was 
custom constructed by Stratagene from two batches of 
induced HUVEC cells which were separately processed. 

35 Essentially, all the libraries were prepared in the 

same manner. First, poly (A+) RNA (mRNA) was purified. For 
the U-937 and HKC-1 RNA, cDNA synthesis was only primed 
with oligo dT. For the THP-l and HUVEC RNA, cDNA synthesis 
was primed separately with both oligo dT and random 
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hexamers, and the two cDNA libraries were treated 
separately* Synthetic adaptor oligonucleotides were 
ligated onto cDNA ends enabling its insertion into the Uni- 
Zap*™ vector system (Stratagene) , allowing high efficiency 
5 unidirectional (sense orientation) lambda library 

construction and the convenience of a plasmid system with 
blue-white color selection to detect clones with cDNA 
insertions. Finally, the two libraries were combined into 
a single library by mixing equal numbers of bacteriophage. 

10 The libraries can be screened with either DNA probes 

or antibody probes and the pBluescript® phagemid 
(Stratagene) can be rapidly excised in vivo . The phagemid 
allows the use of a plasmid system for easy insert 
characterization, sequencing, site-directed mutagenesis, 

15 the creation of unidirectional deletions and expression of 
fusion proteins. The custom-constructed library phage 
particles were infected into E. coli host strain XLl-Blue® 
(Stratagene) , which has a high transformation efficiency, 
increasing the probability of obtaining rare, under- 

2 0 represented clones in the cDNA library. 

6.3. ISOLATION OF cDNA CLONES 
The phagemid forms of individual cDNA clones were 
obtained by the in vivo excision process, in which the host 
bacterial strain was coinfected with both the lambda 

25 library phage and an fl helper phage. Proteins derived 

from both the library-containing phage and the helper phage 
nicked the lambda DNA, initiated new DNA synthesis from 
defined sequences on the lambda target DNA and created a 
smaller, single stranded circular phagemid DNA. molecule 

30 that included all DNA sequences of the pBluescript® plasmid 
and the cDNA insert. The phagemid DNA was secreted from 
the cells and purified, then used to re-infect fresh host 
cells, where the double stranded phagemid DNA was produced. 
Because the phagemid carries the gene for beta-lactamase, 

35 the newly-transformed bacteria are selected on medium 
containing ampicillin. 

Phagemid DNA was purified using the Magic Minipreps™ 
DNA Purification System (Promega catalogue #A7100. Promega 
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Corp., 2800 Woods Hollow Rd., Madison, WI 53711). This 
small-scale process provides a simple and reliable method 
for lysing the bacterial cells and rapidly isolating 
purified phagemid DNA using a proprietary DNA-binding 
5 resin. The DNA was eluted from the purification resin 
already prepared for DNA sequencing and other analytical 
manipulations . 

Phagemid DNA was also purified using the QIAwell-8 
Plasmid Purification System from QIAGENcb> DNA Purification 

10 System (QIAGEN Inc., 9259 Eton Ave., Chattsworth, CA 

91311) . This product line provides a convenient, rapid and 
reliable high-throughput method for lysing the bacterial 
cells and isolating highly purified phagemid DNA using 
QIAGEN anion-exchange resin particles with EMPOREl™ membrane 

15 technology from 3M in a multiwell format. The DNA was 

eluted from the purification resin already prepared for DNA 
sequencing and other analytical manipulations. 

An alternate method of purifying phagemid has recently 
become available. It utilizes the Miniprep Kit (Catalog 

20 No. 77468, available from Advanced Genetic Technologies 
Corp., 19212 Orbit Drive, Gaithersburg, Maryland). This 
kit is in the 96-well format and provides enough reagents 
for 960 purifications. Each kit is provided with a 
recommended protocol, which has been employed except for 

25 the following changes. First, the 96 wells are each filled 
with only 1 ml of sterile terrific broth with carbenicillin 
at 25 mg/L and glycerol at 0.4%. After the wells are 
inoculated, the bacteria are cultured for 24 hours and 
lysed with 60 fil of lysis buffer. A centrif ugation step 

30 (2900 rpm for 5 minutes) is performed before the contents 
of the block are added to the primary filter plate. The 
optional step of adding isopropanol to THIS buffer is not 
routinely performed. After the last step in the protocol, 
samples are transferred to a Beckman 96-well block for 

35 storage. 

Another new DNA purification system is the WIZARD™ 
product line which is available from Promega (catalog No. 
A7071) and may be adaptable to the 96-well format. 
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6.4. SEQUENCING OF cDNA CLONES 
The cDNA inserts from random isolates of the U-937 and 
THP-1 libraries were sequenced in part. Methods for DNA 
sequencing are well known in the art. Conventional 
5 enzymatic methods employ DNA polymerase Klenow fragment, 
Sequenase™ or Taq polymerase to extend DNA chains from an 
oligonucleotide primer annealed to the DNA template of 
interest. Methods have been developed for the use of both 
single- and double-stranded templates. The chain 

10 termination reaction products are usually electrophoresed 
on urea-acrylamide gels and are detected either by 
autoradiography (for radionuclide-labeled precursors) or by 
fluorescence (for fluorescent-labeled precursors) . Recent 
improvements in mechanized reaction preparation, sequencing 

15 and analysis using the fluorescent detection method have 
permitted expansion in the number of sequences that can be 
determined per day (such as the Applied Biosystems 373 and 
377 DNA sequencer, Catalyst 800) . Currently with the 
system as described, read lengths range from 250 to 400 

20 bases and are clone dependent. Read length also varies 
with the length of time the gel is run. In general, the 
shorter runs tend to truncate the sequence. A minimum of 
only about 25 to 50 bases is necessary to establish the 
identification and degree of homology of the sequence. 

25 Gene transcript imaging can be used with any sequence- 
specific method, including, but not limited to 
hybridization, mass spectroscopy, capillary electrophoresis 
and 505 gel electrophoresis. 

6.5. HOMOLOGY SEARCHING OF cDNA CLONE AND 
30 DEDUCED PROTEIN (and Subsequent Steps) 

Using the nucleotide sequences derived from the cDNA 

clones as query sequences (sequences of a Sequence 

Listing) , databases containing previously identified 

sequences are searched for areas of homology (similarity) . 

35 Examples of such databases include Genbank and EMBL. We 

next describe examples of two homology search algorithms 

that can be used, and then describe the subsequent 

computer-implemented steps to be performed in accordance 

with preferred embodiments of the invention. 
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In the following description of the computer- 
implemented steps of the invention, the word "library" 
denotes a set (or population) of biological specimen 
nucleic acid sequences. A "library" can consist of cDNA 
5 sequences, RNA sequences, or the like, which characterize a 
biological specimen. The biological specimen can consist 
of cells of a single human cell type (or can be any of the 
other above-mentioned types of specimens) . We contemplate 
that the sequences in a library have been determined so as 
10 to accurately represent or characterize a biological 

specimen (for example, they can consist of representative 
cDNA sequences from clones of RNA taken from a single human 
cell). 

In the following description of the computer- 
15 implemented steps of the invention, the expression 

"database" denotes a set of stored data which represent a 
collection of sequences, which in turn represent a 
collection of biological reference materials. For example, 
a database can consist of data representing many stored 
20 cDNA sequences which are in turn representative of human 
cells infected with various viruses, cells of humans of 
various ages, cells from different mammalian species, and 
so on. 

In preferred embodiments, the invention employs a 
25 computer programmed with software (to be described) for 
performing the following steps: 

(a) processing data indicative of a library of cDNA 
sequences (generated as a result of high-throughput cDNA 
sequencing or other method) to determine whether each 

30 sequence in the library matches a DNA sequence of a 

reference database of DNA sequences (and if so, identifying 
the reference database entry which matches the sequence and 
indicating the degree of match between the reference 
sequence and the library sequence) and assigning an 

35 identified sequence value based on the sequence annotation 
and degree of match to each of the sequences in the 
library; 

(b) for some or all entri s of the database, 
tabulating the number of matching identified sequence 
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values in the library (Although this can be done by human 
hand from a printout of all entries, we prefer to perform 
this step using computer software to be described below,)/ 
thereby generating a set of final data values or "abundance 
5 numbers"; and 

(c) if the libraries are different sizes, dividing 
each abundance number by the total number of sequences in 
the library, to obtain a relative abundance number for each 
identified sequence value (i.e., a relative abundance of 

10 each gene transcript) • 

The list of identified sequence values (or genes 
corresponding thereto) can then be sorted by abundance in 
the cDNA population. A multitude of additional types of 
comparisons or dimensions are possible. 

15 For example (to be described below in greater detail) , 

steps (a) and (b) can be repeated for two different 
libraries (sometimes referred to as a "target" library and 
a "siibtractant" library) . Then, for each identified 
sequence value (or gene transcript) , a "ratio" value is 

20 obtained by dividing the abundance number (for that 

identified sequence value) for the target library, by the 
abundance number (for that identified sequence value) for 
the subtractant library. 

In fact, subtraction may be carried out on multiple 

25 libraries. It is possible to add the transcripts from 

several libraries (for example, three) and then to divide 
them by another set of transcripts from multiple libraries 
(again, for example, three) . Notation for this operation 
may be abbreviated as (A+B+C) / (D+E+F) , where the capital 

30 letters each indicate an entire library. Optionally the 
abundance numbers of transcripts in the summed libraries 
may be divided by the total sample size before subtraction. 

Unlike standard hybridization technology which permits 
a single subtraction of two libraries, once one has 

35 processed a set or library transcript sequences and stored 
them in the computer, any number of subtractions can be 
performed on the library. For example, by this method, 
ratio values can be obtained by dividing relative abundance 
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values in a first library by corresponding values in a 
second library and vice versa. 

In variations on step (a) , the library consists of 
nucleotide sequences derived from cDNA clones. Examples of 
5 databases which can be searched for areas of homology 

(similarity) in step (a) include the commercially available 
databases known as Genbank (NIH) EMBL (European Molecular 
Biology Labs, Germany) , and GENESEQ (Intelligenetics, 
Mountain View, California) . 

10 One homology search algorithm which can be used to 

implement step (a) is the algorithm described in the paper 
by D.J, Lipman and W.R. Pearson, entitled "Rapid and 
Sensitive Protein Similarity Searches," Science . 227:1435 
(1985). In this algorithm, the homologous regions are 

15 searched in a two-step manner. In the first step, the 

highest homologous regions are determined by calculating a 
matching score using a homology score table. The parameter 
"Ktup" is used in this step to establish the minimum window 
size to be shifted for comparing two sequences. Ktup also 

20 sets the number of bases that must match to extract the 
highest homologous region among the sequences. In this 
step, no insertions or deletions are applied and the 
homology is displayed as an initial (INIT) value. 

In the second step, the homologous regions are aligned 

25 to obtain the highest matching score by inserting a gap in 
order to add a probable deleted portion. The matching 
score obtained in the first step is recalculated using the 
homology score Table and the insertion score Table to an 
optimized (OPT) value in the final output. 

30 DNA homologies between two sequences can be examined 

graphically using the Harr method of constructing dot 
matrix homology plots (Needleman, S.B. and Wunsch, CO., J. 
Mom. Biol 48:443 (1970)). This method produces a 
two-dimensional plot which can be useful in determining 

35 regions of homology versus regions of repetition. 

However, in a class of preferred embodiments, step (a) 
is implemented by processing the library data in the 
commercially available computer program known as the 
INHERIT 670 Sequence Analysis System, available from 
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Applied Biosystems Inc. (Foster City, California), 
including the software known as the Factura software (also 
available from Applied Biosystems Inc.)* The Factura 
program preprocesses each library sequence to "edit out" 
5 portions thereof which are not likely to be of interest, 
such as the vector used to prepare the library. Additional 
sequences which can be edited out or masked (ignored by the 
search tools) include but are not limited to the polyA tail 
and repetitive GAG and CCC sequences. A low-end search* 

10 program can be written to mask out such "low-information" 
sequences, or programs such as BLAST can ignore the low- 
information sequences. 

In the algorithm implemented by the INHERIT 670 
Sequence Analysis System, the Pattern Specification 

15 Language (developed by TRW Inc.) is used to determine 
regions of homology. "There are three parameters that 
determine how INHERIT analysis runs sequence comparisons: 
window size, window offset and error tolerance. Window 
size specifies the length of the segments into which the 

20 query sequence is subdivided. Window offset specifies 

where to start the next segment [to be compared] , counting 
from the beginning of the previous segment. Error 
tolerance specifies the total number of insertions, 
deletions and/or substitutions that are tolerated over the 

25 specified word length. Error tolerance may be set to any 
integer between 0 and 6. The default settings are window 
tolerance=20, window offset=10 and error tolerance=3 . " 
INHERIT Analysis Users Manual , pp. 2-15. Version 1*0, 
Applied Biosystems, Inc., October 1991. 

30 Using a combination of these three parameters, a 

database (such as a DNA database) can be searched for 
sequences containing regions of homology and the 
appropriate sequences are scored with an initial value. 
Subsequently, these homologous regions are examined using 

35 dot matrix homology plots to determine regions of homology 
versus regions of repetition. Smith-Waterman alignments 
can be used to display the results of the homology search. 
The INHERIT software can be executed by a Sun computer 
system programmed with the UNIX operating system. 
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Search alternatives to INHERIT include the BLAST 
program, GCG (available from the Genetics Computer Group, 
WI) and the Dasher program (Temple Smith, Boston 
University, Boston, MA) . Nucleotide sequences can be 
5 searched against Genbank, EMBL or custom databases such as 
GENESEQ (available from Intelligenetics , Mountain View, CA) 
or other databases for genes. In addition, we have 
searched some sequences against our own in-house database. 
In preferred embodiments, the transcript sequences are 

10 analyzed by the INHERIT software for best conformance with 
a reference gene transcript to assign a sequence identifier 
and assigned the degree of homology, which together are the 
identified sequence value and are input into, and further 
processed by, a Macintosh personal computer (available from 

15 Apple) programmed with an "abundance sort and subtraction 
analysis" computer program (to be described below) . 

Prior to the abundance sort and subtraction analysis 
program (also denoted as the "abundance sort" program), 
identified sequences from the cDNA clones are assigned 

20 value (according to the parameters given above) by degree 
of match according to the following categories: "exact" 
matches (regions with a high degree of identity) , 
homologous human matches (regions of high similarity, but 
not "exact" matches) , homologous non-human matches (regions 

25 of high similarity present in species other than human) , or 
non matches (no significant regions of homology to 
previously identified nucleotide sequences stored in the 
form of the database) . Alternately, the degree of match 
can be a numeric value as described below. 

30 With reference again to the step of identifying 

matches between reference sequences and database entries, 
protein and peptide sequences can be deduced from the 
nucleic acid sequences. Using the deduced polypeptide 
sequence, the match identification can be performed in a 

35 manner analogous to that done with cDNA sequences. A 

protein sequence is used as a query sequence and compared 
to the previously identified sequences contained in a 
database such as the Swiss /Prot, PIR and the NBRF Protein 
database to find homologous proteins. These proteins are 
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initially scored for homology using a homology score Table 
(Orcutt, B.C, and Dayoff, M.O, Scoring Matrices, PIR 
Report MAT - 0285 (February 1985)) resulting in an INIT 
score. The homologous regions are aligned to obtain the 
5 highest matching scores by inserting a gap which adds a 
probable deleted portion. The matching score is 
recalculated using the homology score Table and the 
insertion score Table resulting in an optimized (OPT) 
score. Even in the absence of knowledge of the proper 

10 reading frame of an isolated sequence, the above-described 
protein homology search may be performed by searching all 3 
reading frames. 

Peptide and protein sequence homologies can also be 
ascertained using the INHERIT 67 0 Sequence Analysis System 

15 in an analogous way to that used in DNA sequence 

homologies. Pattern Specification Language and parameter 
windows are used to search protein databases for sequences 
containing regions of homology which are scored with an 
initial value. Subsequent display in a dot-matrix homology 

20 plot shows regions of homology versus regions of 

repetition. Additional search tools that are available to 
use on pattern search databases include PLsearch Blocks 
(available from Henikoff & Henikoff , University of 
Washington, Seattle) , Dasher and GCG. Pattern search 

25 databases include, but are not limited to, Protein Blocks 
(available from Henikoff & Henikoff, University of 
Washington, Seattle) , Brookhaven Protein (available from 
the Brookhaven National Laboratory, Brookhaven, MA) , 
PROSITE (available from Amos Bairoch, University of Geneva, 

30 Switzerland) , ProDom (available from Temple Smith, Boston 
University) , and PROTEIN MOTIF FINGERPRINT (available from 
University of Leeds, United Kingdom) . 

The ABI Assembler application software, part of the 
INHERIT DNA analysis system (available from Applied 

35 Biosystems, Inc., Foster City, CA) , can be employed to 

create and manage sequence assembly projects by assembling 
data from selected sequence fragments into a larger 
sequence. The Assembler software combines two advanced 
computer technologies which maximize the ability to 
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assemble sequenced DNA fragments into Assemblages, a 
special grouping of data where the relationships betw en 
sequences are shown by graphic overlap, alignment and 
statistical views. The process is based on the 
5 Meyers-Kececioglu model of fragment assembly (INHERIT™ 
Assembler User's Manual, Applied Biosystems, Inc., Foster 
City, CA) , and uses graph theory as the foundation of a 
very rigorous multiple sequence alignment engine for 
assembling DNA sequence fragments. Other assembly programs 

10 that can be used include MEGALIGN (available from DNASTAR 
Inc., Madison, WI) , Dasher and STADEN (available from Roger 
Staden, Cambridge, England) . 

Next, with reference to Fig. 2, we describe in more 
detail the "abundance sort" program which implements above- 

15 mentioned "step (b) " to tabulate the number of sequences of 
* the library which match each database entry (the "abundance 
number" for each database entry) . 

Fig. 2 is a flow chart of a preferred embodiment of 
the abundance sort program. A source code listing of this 

20 embodiment of the abundance sort program is set forth in 

Table 5. In the Table 5 implementation, the abundance sort 
program is written using the FoxBASE programming language 
commercially available from Microsoft Corporation. 
Although FoxBASE was the program chosen for the first 

25 iteration of this technology, it should not be considered 
limiting. Many other programming languages, Sybase being a 
particularly desirable alternative, can also be used, as 
will be obvious to one with ordinary skill in the art. The 
subroutine names specified in Fig. 2 correspond to 

30 subroutines listed in Table 5. 

With reference again to Fig. 2, the "Identified 
Sequences" are transcript sequences representing each 
sequence of the library and a corresponding identification 
of the database entry (if any) which it matches. In other 

35 words, the "Identified Sequences" are transcript sequences 
representing the output of above-discussed "step (a)," 

Fig. 3 is a block diagram of a system for implementing 
the invention. The Fig. 3 system includes library 
generation unit 2 which generates a library and asserts an 
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output stream of transcript sequences indicative of the 
biological sequences comprising the library. Programmed 
processor 4 receives the data stream output from unit 2 and 
processes this data in accordance with above-discussed 
5 "step (a)" to generate the Identified Sequences. Processor 
4 can be a processor programmed with the commercially 
available computer program known as the INHERIT 670 
Sequence Analysis System and the commercially available 
computer program known as the Factura program (both 

10 available from Applied Biosystems Inc.) and with the UNIX 
operating system. 

Still with reference to Fig. 3, the Identified 
Sequences are loaded into processor 6 which is programmed 
with the abundance sort program. Processor 6 generates the 

15 Final Transcript sequences indicated in both Figs. 2 and 3. 
Fig. 4 shows a more detailed block diagram of a planned 
relational computer system, including various searching 
techniques which can be implemented, along with an 
assortment of databases to query against. 

20 With reference to Fig. 2, the abundance sort program 

first performs an operation known as "Tempnum" on the 
Identified Sequences, to discard all of the Identified 
Sequences except those which match database entries of 
selected types. For example, the Tempnum process can 

25 select Identified Sequences which represent matches of the 
following types with database entries {see above for 
definition) : "exact" matches, human "homologous" matches, 
"other species" matches representing genes present in 
species other than human) , "no" matches (no significant 

30 regions of homology with database entries representing 
previously identified nucleotide sequences) , "I" matches 
(Incyte for not previously known DNA sequences) , or "X" 
matches (matches ESTs in reference database) . This 
eliminates the U, S, M, V, A, R and D sequence (see Table 1 

35 for definitions) . 

The identified sequence values selected during the 
"Tempnum" process then undergo a further selection (weeding 
out) operation known as "Tempred." This operation can, for 
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example, discard all identified sequence values 
representing matches with selected database entries. 

The identified sequence values selected during the 
"Tempred" process are then classified according to library, 
5 during the "Tempdesig" operation. It is contemplated that 
the "Identified Sequences" can represent sequences from a 
single library, or from two or more libraries. 

Consider first the case that the identified sequence 
values represent sequences from a single library. In this 

10 case, all the identified sequence values determined during 
"Tempred" undergo sorting in the "Templib" operation, 
further sorting in the "Libsort" operation, and finally 
additional sorting in the "Temptarsort" operation. For 
example, these three sorting operations can sort the 

15 identified sequences in order of decreasing "abundance 
number" (to generate a list of decreasing abundance 
numbers, each abundance number corresponding to a unique 
identified sequence entry, or several lists of decreasing 
abundance numbers, with the abundance numbers in each list 

20 corresponding to database entries of a selected type) with 
redundancies eliminated from each sorted list. In this 
case, the operation identified as "Cruncher" can be 
bypassed, so that the "Final Data" values are the organized 
transcript sequences produced during the "Temptarsort" 

25 operation. 

We next consider the case that the transcript 
sequences produced during the "Tempred" operation represent 
sequences from two libraries (which we will denote the 
"target" library and the "subtractant" library) . For 

30 example, the target library may consist of cDNA sequences 
from clones of a diseased cell, while the subtractant 
library may consist of cDNA sequences from clones of the 
diseased cell after treatment by exposure to a drug. For 
another example, the target library may consist of cDNA 

35 sequences from clones of a cell type from a young human, 

while the subtractant library may consist of cDNA sequences 
from clones of the same cell type from the same human at 
different ages. 
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In this case, the "Tempdesig" operation routes all 
transcript sequences representing the target library for 
processing in accordance with "Templib" (and then "Libsort" . 
and "Temptarsort") , and routes all transcript sequences 
5 representing the subtractant library for processing in 
accordance with "Tempsub" (and then "Subsort" and 
"Tempsubsort") . For example, the consecutive "Templib," 
"Libsort," and "Temptarsort" sorting operations sort 
identified sequences from the target library in order of 

10 decreasing abundance number (to generate a list of 
decreasing abundance numbers, each abundance number 
corresponding to a database entry, or several lists of 
decreasing abundance numbers, with the abundance numbers in 
each list corresponding to database entries of a selected 

15 type) with redundancies eliminated from each sorted list. 
'The consecutive "Tempsub," "Subsort," and "Tempsubsort" 
sorting operations sort identified sequences from the 
subtractant library in order of decreasing abundance number 
(to generate a list of decreasing abundance numbers, each 

20 abundance number corresponding to a database entry, or 
several lists of decreasing abundance numbers, with the 
abundance numbers in each list corresponding to database 
entries of a selected type) with redundancies eliminated 
from each sorted list, 

25 The transcript sequences output from the "Temptarsort" 

operation typically represent sorted lists from which a 
histogram could be generated in which position along one 
(e.g., horizontal) axis indicates abundance number (of 
target library sequences) , and position along another 

30 (e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type). Similarly, the 
transcript sequences output from the "Tempsubsort" 
operation typically represent sorted lists from which a 
histogram could be generated in which position along one 

35 (e.g., horizontal) axis indicates abundance number (of 

subtractant library sequences) , and position along another 
(e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type). 
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The transcript sequences (sorted lists) output from 
the Tempsubsort and Temptarsort sorting operations are 
combined during the operation identified as "Cruncher." 
The "Cruncher" process identifies pairs of corresponding 
5 target and subtractant abundance numbers (both representing 
the same identified sequence value) , and divides one by the 
other to generate a "ratio" value for each pair of 
corresponding abundance numbers, and then sorts the ratio 
values in order of decreasing ratio value. The data output 

10 from the "Cruncher" operation (the Final Transcript 

sequence in Fig, 2) is typically a sorted list from which a 
histogram could be generated in which position along one 
axis indicates the size of a ratio of abundance numbers 
(for corresponding identified sequence values from target 

15 and subtractant libraries) and position along another axis 
indicates identified sequence value (e.g., gene type). 

Preferably, prior to obtaining a ratio between the two 
library abundance values, the Cruncher operation also 
divides each ratio value by the total number of sequences 

20 in one or both of the target and subtractant libraries. 

The resulting lists of "relative" ratio values generated by 
the Cruncher operation are useful for many medical, 
scientific, and industrial applications. Also preferably, 
the output of the Cruncher operation is a set of lists, 

25 each list representing a sequence of decreasing ratio 
values for a different selected subset (e.g. protein 
family) of database entries. 

In one example, the abundance sort program of the 
invention tabulates for a library the numbers of mRNA 

30 transcripts corresponding to each gene identified in a 

database. These numbers are divided by the total number of 
clones sampled. The results of the division reflect the 
relative abundance of the mRNA transcripts in the cell type 
or tissue from which they were obtained. Obtaining this 

35 final data set is referred to herein as "gene transcript 
image analysis." The resulting subtracted data show 
exactly what proteins and genes are upregulated and 
downregulated in highly detailed complexity. 
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6.6. HUVEC CDNA LIBRARY 
Table 2 is an abundance table listing the various gene 
transcripts in an induced HUVEC library. The transcripts 
are listed in order of decreasing abundance. This * 
5 computerized sorting simplifies analysis of the tissue and 
speeds identification of significant new proteins which are 
specific to this cell type. This type of endothelial cell 
lines tissues of the cardiovascular system, and the more 
that is known about its composition, particularly in 
10 response to activation, the more choices of protein targets 
become available to affect in treating disorders of this 
tissue, such as the highly prevalent atherosclerosis. 

6.7. MONOCYTE-CELL AND MAST-CELL CDNA LIBRARIES 
Tables 3 and 4 show truncated comparisons of two 

15 libraries. In Tables 3 and 4 the "normal monocytes" are 
the HMC-1 cells, and the "activated macrophages" are the 
THP-1 cells pretreated with PMA and activated with LPS. 
Table 3 lists in descending order of abundance the most 
abundant gene transcripts for both cell types. With only 

20 15 gene transcripts from each cell type, this table permits 
quick, qualitative comparison of the most common 
transcripts. This abundance sort, with its convenient 
side-by-side display, provides an immediately useful 
research tool. In this example, this research tool 

25 discloses that 1) only one of the top 15 activated 
macrophage transcripts is found in the top 15 normal 
monocyte gene transcripts (poly A binding protein) ; and 2) 
a new gene transcript (previously unreported in other 
databases) is relatively highly represented in activated 

30 macrophages but is not similarly prominent in normal 

macrophages- Such a research tool provides researchers 
with a short-cut to new proteins, such as receptors, cell- 
surface and intracellular signalling molecules, which can 
serve as drug targets in commercial drug screening 

35 programs. Such a tool could save considerable time over 
that consumed by a hit and miss discovery program aimed at 
identifying important proteins in and around cells, because 
those proteins carrying out everyday cellular functions and 
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represented as steady state mRNA are quickly eliminated 
from further characterization. 

This illustrates how the gene transcript profiles 
change with altered cellular function. Those skilled in 
5 the art know that the biochemical composition of cells also 
changes with other functional changes such as cancer, 
including cancer's various stages, and exposure to 
toxicity. A gene transcript subtraction profile such as in 
Table 3 is useful as a first screening tool for such gene 
10 expression and protein studies. 

6.8. SUBTRACTION ANALYSIS OF NORMAL MONOCYTE-CELL AND 
ACTIVATED MONOCYTE CELL cDNA LIBRARIES 

Once the cDNA data are in the computer, the computer 

program as disclosed in Table 5 was used to obtain ratios 

15 of all the gene transcripts in the two libraries discussed 
in Example 6.7, and the gene transcripts were sorted by the 
descending values of their ratios. If a gene transcript is 
not represented in one library, that gene transcript's 
abundance is unknown but appears to be less than 1. As an 

20 approximation — and to obtain a ratio, which would not be 
possible if the unrepresented gene were given an abundance 
of zero — genes which are represented in only one of the 
two libraries are assigned an abundance of 1/2- Using 1/2 
for unrepresented clones increases the relative importance 

25 of "turned-on" and "turned-off" genes, whose products would 
be drug candidates. The resulting print-out is called a 
subtraction table and is an extremely valuable screening 
method, as is shown by the following data. 

Table 4 is a subtraction table, in which the normal 

30 monocyte library was electronically "subtracted" from the 
activated macrophage library. This table highlights most 
effectively the changes in abundance of the gene 
transcripts by activation of macrophages. Even among the 
first 20 gene transcripts listed, there are several unknown 

35 gene transcripts. Thus, electronic subtraction is a useful 
tool with which to assist researchers in identifying much 
more quickly the basic biochemical changes between two cell 
types. Such a tool can save universities and 
pharmaceutical companies which spend billions of dollars on 
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research valuable time and laboratory resources at the 
early discovery stage and can speed up the drug development 
cycle, which in turn permits researchers to set up drug 
screening programs much earlier. Thus, this research tool 
5 provides a way to get new drugs to the public faster and 
more economically. 

Also, such a subtraction table can be obtained for 
patient diagnosis. An individual patient sample (such as 
monocytes obtained from a biopsy or blood sample) can be 

10 compared with data provided herein to diagnose conditions 
associated with macrophage activation. 

Table 4 uncovered many new gene transcripts (labeled 
Incyte clones) . Note that many genes are turned on in the 
activated macrophage (i.e., the monocyte had a 0 in the 

15 bgfreq column) . This screening method is superior to other 
screening techniques, such as the western blot, which are 
incapable of uncovering such a multitude of discrete new 
gene transcripts. 

The subtraction-screening technique has also uncovered 

20 a high number of cancer gene transcripts (oncogenes rho, 
ETS2,.rab-2 ras, YPTl-related, and acute myeloid leukemia 
mRNA) in the activated macrophage. These transcripts may 
be attributed to the use of immortalized cell lines and are 
inherently interesting for that reason. This scireening 

25 technique offers a detailed picture of upregulated 

transcripts including oncogenes, which helps explain why 
anti-cancer drugs interfere with the patient's immunity 
mediated by activated macrophages. Armed with knowledge 
gained from this screening method, those skilled in the art 

30 can set up more targeted, more effective drug screening 
programs to identify drugs which are differentially 
effective against 1) both relevant cancers and activated 
macrophage conditions with the same gene transcript 
profile; 2) cancer alone; and 3) activated macrophage 

35 conditions. 

Smooth muscle senescent protein (22 kd) was 
upregulated in the activated macrophage, which indicates 
that it is a candidate to block in controlling 
inflammation. 
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6.9, SUBTRACTION ANALYSIS OP NORMAL LIVER CELLS AND 
HEPATITIS INFECTED LIVER CELL cDNA LIBRARIES 

In this example, rats are exposed to hepatitis virus 

and maintained in the colony until they show definite signs 

5 of hepatitis. Of the rats diagnosed with hepatitis, one 

half of the rats are treated with a new anti-hepatitis 

agent (AHA) . Liver samples are obtained from all rats 

before exposure to the hepatitis virus and at the end of 

AHA treatment or no treatment. In addition, liver samples 

10 can be obtained from rats with hepatitis just prior to AHA 

treatment . 

The liver tissue is treated as described in Examples 
6.2 and 6.3 to obtain mRNA and subsequently to sequence 
cDNA. The cDNA from each sample are processed and analyzed 

15 for abundance according to the computer program in Table 5. 
The resulting gene transcript images of the cDNA provide 
detailed pictures of the baseline (control) for each animal 
and of the infected and/or treated state of the animals. 
cDNA data for a group of samples can be combined into a 

20 group summary gene transcript profile for all control 
samples, all samples from infected rats and all samples 
from AHA- treated rats. 

Subtractions are performed between appropriate 
individual libraries and the grouped libraries. For 

25 individual animals, control and post-study samples can be 
subtracted. Also, if samples are obtained before and after 
AHA treatment, that data from individual animals and 
treatment groups can be subtracted. In addition, the data 
for all control samples can be pooled and averaged. The . 

30 control average can be subtracted from averages of both 
post-study AHA and post-study non-AHA cDNA samples. If 
pre- and post-treatment samples are available, pre- and 
post-treatment samples can be compared individually (or 
electronically averaged) and subtracted. 

35 These subtraction tables are used in two general ways. 

First, the differences are analyzed for gene transcripts 
which are associated with continuing hepatic deterioration 
or healing. The subtraction tables are tools to isolate 
the effects of the drug treatment from the underlying basic 

40 pathology of hepatitis. Because hepatitis affects many 
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parameters, additional liver toxicity has been difficult to 
detect with only blood tests for the usual enzymes. The 
gene transcript profile and subtraction provides a much 
more complex biochemical picture which researchers have 
5 needed to analyze such difficult problems. 

Second, the subtraction tables provide a tool for 
identifying clinical markers, individual proteins or other 
biochemical determinants which are used to predict and/or 
evaluate a clinical endpoint, such as disease, improvement 

10 due to the drug, and even additional pathology due to the 
drug. The subtraction tables specifically highlight genes 
which are turned on or off. Thus, the subtraction tables 
provide a first screen for a set of gene transcript 
candidates for use as clinical markers. Subsequently, 

15 electronic subtractions of additional cell and tissue 

libraries reveal which of the potential markers are in fact 
found in different cell and tissue libraries. Candidate 
gene transcripts found in additional libraries are removed 
from the set of potential clinical markers. Then, tests of 

20 blood or other relevant samples which are known to lack and 
have the relevant condition are compared to validate the 
selection of the clinical marker. In this method, the 
particular physiologic function of the protein transcript 
need not be determined to qualify the gene. transcript as a 

25 clinical marker. 

6.10. ELECTRONIC NORTHERN BLOT 
One limitation of electronic subtraction is that it is 
difficult to compare more than a pair of images at once. 
Once particular individual gene products are identified as 

30 relevant to further study (via electronic subtraction or 
other methods) , it is useful to study the expression of 
single genes in a multitude of different tissues. In the 
lab, the technique of "Northern" blot hybridization is used 
for this purpose. In this technique, a single cDNA, or a 

35 probe corresponding thereto, is labeled and then hybridized 
against a blot containing RNA samples prepared from a 
multitude of tissues or cell types. Upon autoradiography, 
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the pattern of expression of that particular gene, one at a 
time, can be quant itated in all the included samples. 

In contrast, a further embodiment of this invention is. 
the computerized form of this process, termed here 
5 "electronic northern blot." In this variation, a single 
gene is queried for expression against a multitude of 
prepared and sequenced libraries present within the 
database. In this way, the pattern of expression of any 
single candidate gene can be examined instantaneously and 

10 effortlessly. More candidate genes can thus be scanned, 
leading to more frequent and fruitfully relevant 
discoveries. The computer program included as Table 5 
includes a program for performing this function, and Table 
6 is a partial listing of entries of the database used in 

15 the electronic northern blot analysis. 

6.11. PHASE I CLINICAL TRIALS 
Based on the establishment of safety and effectiveness 
in the above animal tests. Phase I clinical tests are 
undertaken. Normal patients are subjected to the usual 

20 preliminary clinical laboratory tests. In addition, 
appropriate specimens are taken and subjected to gene 
transcript analysis. Additional patient specimens are 
taken at predetermined intervals during the test. The 
specimens are subjected to gene transcript analysis as 

25 described above. In addition, the gene transcript changes 
noted in the earlier rat toxicity study are carefully 
evaluated as clinical markers in the followed patients. 
Changes in the gene transcript analyses are evaluated as 
indicators of toxicity by correlation with clinical signs 

30 and symptoms and other laboratory results. In addition, 
subtraction is performed on individual patient specimens 
and on averaged patient specimens. The subtraction 
analysis highlights any toxicological changes in the 
treated patients. This is a highly refined determinant of 

35 toxicity. The subtraction method also annotates clinical 
markers. Further subgroups can be analyzed by subtraction 
analysis, including, for example, 1) segregation by 
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occurrence and type of adverse effect; and 2) segregation 
by dosage. 

6 . 12 . GENE TRANSCRIPT IMAGING ANALYSIS IN CLINICAL STUDIES 
A gene transcript imaging analysis (or multiple gene 

5 transcript imaging analyses) is a useful tool in other 
clinical studies. For example, the differences in gene 
transcript imaging analyses before and after treatment can 
be assessed for patients on placebo and drug treatment. 
This method also effectively screens for clinical markers 
10 to follow in clinical use of the drug. 

6.13. COMPARATIVE GENE TRjVNSCRIPT ANALYSIS BETWEEN SPECIES 
The subtraction method can be used to screen cDNA 

libraries from diverse sources. For example, the same cell 
types from different species can be compared by gene 

15 transcript analysis to screen for specific differences, 
such as in detoxification enzyme systems. Such testing 
aids in the selection and validation of an animal model for 
the commercial purpose of drug screening or toxicological 
testing of drugs intended for human or animal use. When 

20 the comparison between animals of different species is 

shown in columns for each species, we refer to this as an 
interspecies comparison, or zoo blot. 

Embodiments of this invention may employ databases 
such as those written using the FoxBASE programming 

25 language commercially available from Microsoft Corporation. 
Other embodiments of the invention employ other databases, 
such as a random peptide database, a polymer database, a 
synthetic oligomer database, or a oligonucleotide database 
of the type described in U.S. Patent 5,270,170, issued 

30 December 14, 1993 to Cull, et al., PCT International 

Application Publication No. WO 9322684, published November 
11, 1993, PCT International Application Publication No. WO 
9306121, published April 1, 1993, or PCT International 
Application Publication No. WO 9119818, published December 

35 26, 1991. These four references (whose text is 

incorporated herein by reference) include teaching which 
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may be applied in implementing such other embodiments of 
the present invention. 

All references referred to in the preceding text are 
hereby expressly incorporated by reference herein. 
5 Various modifications and variations of the described 

method and system of the invention will be apparent to 
those skilled in the art without departing from the scope 
and spirit of the invention. Although the invention has 
been described in connection with specific preferred 
10 embodiments, it should be understood that the invention as 
claimed should not be unduly limited to such specific 
embodiments. 
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TABLE 2 



Clone numbers 15000 through 20000 

Libraries: HUVEC 

Arranged by ABUNDANCE 

Total clones analyzed: 5000 



319 


genes, for 


a total of 


1713 Clones 






number 


N C 


entry s 


descriptor 


1 


15365 


67 


HSRPL41 


Riboptn L41 


2 


15004 


65 


NCY015004 


INCYTE 015004 


3 


15638 


63 


NCY015638 


INCYTE 015638 


4 


15390 


50 


NCY015390 


INCYTE 015390 


5 


15193 


47 


HSFIBl 


Fibronectin 


6 


15220 


47 


RRRPL9 R 


Riboptn L9 


7 


15280 


47 


NCY015280 


INCYTE 015280 


8 


15583 


33 


M62060 


EST HHCH09 (ICR) 


9 


15662 


31 


HSACTCGR 


Actin, gamma . 


10 


15026 


29 


NCY015026 


INCYTE 015026 


11 


15279 


24 


HSEFIAR 


Elf 1-alpha 


12 


15027 


23 


NCY015027 


INCYTE 015027 


13 


15033 


20 


NCY015033 


INCYTE 015033 


14 


15198 


20 


NCY015198 


INCYTE 015198 


15 


15809 


20 


HS COLLI 


Collagenase 


16 


15221 


19 


NCY015221 


INCYTE 015221 


17 


15263 


19 


NCY015263 


INCYTE 015263 


18 


15290 


19 


NCY015290 


INCYTE 015290 


19 


15350 


18 


NCY015350 


INCYTE 015350 


20 


15030 


17 


NCY015030 


INCYTE 015030 


21 


15234 


17 


NCY015234 


INCYTE 015234 


22 


15459 


16 


NCY015459 


INCYTE 015459 


23 


15353 


15 


NCY015353 


INCYTE 015353 


24 


15378 


15 


S76965 


Ptn kinase inhib 


25 


15255 


14 


HUMTHYB4 


Thymosin beta-4 


26 


15401 


14 


HSLIPCR 


Lipocortin I 


27 


15425 


14 


HSPOLYAB 


Poly-A bp 


28 


18212 


14 


HUMTHYMA 


Thymosin, alpha 


29 


18216 


14 


HSMRPl 


Motility relat ptn; MRP-l;CD-9 


30 


15189 


13 


HS18D 


Interferon indue ptn 1-8D 


31 


15031 


12 


HUMFKBP 


FK506 bp 


32 


15306 


12 


HSH2AZ 


Histone H2A 


33 


15621 


12 


HUMLEC 


Lectin, B-galbp, 14kDa 


34 


15789 


11 


NCY015789 


INCYTE 015789 


35 


16578 


11 


HSRPSll 


Riboptn Sll 


36 


16632 


11 


M61984 


EST HHCA13 (ICR) 


37 


18314 


11 


NCY018314 


INCYTE 018314 


38 


15367 


10 


NCY015367 


INCYTE 015367 


39 


15415 


10 


HSIFNINl 


interferon indue mRNA 


40 


15633 


10 


HSLDHAR 


Lactate dehydrogenase 


41 


15813 


10 


CHKNMHCB 


C Myosin heavy chain B 


42 


18210 


10 


NCY018210 


INCYTE 018210 


43 


18233 


10 


HSRPII140 


RNA polymerase II 


44 


18996 


10 


NCY018996 


INCYTE 018996 


45 


15088 


9 


HUMFERL 


Ferritin, light chain 


46 


15714 


9 


NCY015714 


INCYTE 015714 


47 


15720 


9 


NCY015720 


INCYTE 015720 


48 


15863 


9 


NCY015863 


INCYTE 015863 


49 


16121 


9 


HSET 


Endothelin 


50 


18252 


9 


NCY018252 


INCYTE 018252 


51 


15351 


8 


HUMALBP 


Lipid bp, adipocyte 


52 


15370 


8 


NCY015370 


INCYTE 015370 
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TABLE 2 Con't 





number 


N 


53 


15670 


8 


54 


15795 


8 


55 


16245 


8 


56 


18262 


8 


57 


18321 


8 


58 


15126 


7 


59 


15133 


7 


60 


15245 


7 


61 


15288 


7 


62 


15294 


7 


63 


15442 


7 


64 


15485 


7 


65 


16646 


7 


66 


18003 . 


7 


67 


15032 


6 


68 


15267 


6 


69 


15295 


6 


70 


15458 


6 


71 


15832 


6 


72 


15928 


6 


73 


16598 


6 


74 


18218 


6 


75 


18499 


6 


76 


18963 


5 


77 


18997 


6 


78 


15432 


5 


79 


15475 


5 


80 


15721 


5 


81 


15865 


5 


82 


16270 


5 


83 


16886 


5 


84 


18500 


5 


85 


18503 


5 


86 


19672 


5 


87 


15086 


4 


88 


15113 


4 


89 


15242 


4 


90 


15249 


4 


91 


15377 


4 


92 


15407 


4 


93 


15473 


4 


94 


15588 


4 


95 


15684 


4 


96 


15782 


4 


97 


15916 


4 


98 


15930 


4 


99 


16108 


4 


100 


16133 


4 



entry 

BTCIASHI 

NCY015795 

NCY016245 

NCy018262 

HSRPL17 

XLRPLIBRF 

HSAC07 

NCY015245 

NCY015288 

HSGAPDR 

HUMLAMB 

HSNGMRNA 

NCY016646 

HUMPAIA 

HUMUB 

HSRPS8 

NCY015295 

RNRPSIOR 

RSGALEM 

HUMAPOJ 

HUMTBBM40 

NCY018218 

HSP27 

NCy018963 

NCY018997 

HSAGALAR 

NCY015475 

NCY015721 

NCY015865 

NCY016270 

NCY016886 

NCY018500 

NCy018503 

RRRPL34 

XLRPLIAR 

HUMIFNWRS 

NCY015242 

NCY015249 

NCY015377 

NCY015407 

NCY015473 

HSRPS12 

HSEFIG 

NCY015782 

HSRPS18 

NCY015930 

NCY016108 

NCY016133 



s 
V 



R 
R 



R 
F 



descriptor 

NADH-ubiq oxidoreductase 

INCYTE 015795 

INCYTE 016245 

INCYTE 018262 

Riboptn L17 

Riboptn LI 

Actin, beta 

INCYTE 015245 

INCYTE 015288 

G-3-PD 

Laminin receptor, 54kDa 
Uracil DNA glycosylase 
INCYTE 016646 
Plsmnogen activ gene 
Ubiquitin 
Riboptn S8 
INCYTE 015295 
Riboptn SIO 

UDP-galactose epimerase 
Apolipoptn J 
Tubulin, beta 
INCYTE 018218 
Hydrophobic ptn p27 
INCYTE 018963 
INCYTE 018997 
Galactosidase A, alpha 
INCYTE 015475 
015721 
015865 
016270 
016886 
018500 
018503 



INCYTE 
INCYTE 
INCYTE 
INCYTE 
INCYTE 
INCYTE 
Riboptn L34 
Riboptn Lla 
tRNA synthetase 
INCYTE 015242 
INCYTE 015249 
INCYTE 015377 
INCYTE 015407 
INCYTE 015473 
Riboptn S12 
Elf 1 -gamma 
INCYTE 015782 
Riboptn S18 
INCYTE 015930 
INCYTE 016108 
INCYTE 016133 



trp 
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TABLE 4 



Libraries: THP-1 

Subtracting: HHC 

Sorted by ABUNDANCE 

Total clones analyzed: 7375 



1057 genes, for a total of 2151 clones 



number 


entry 


s descriptor 


bgfreq rfend 


ratio 


10022 


HUMILl 


XL 1-beta 


Q 


ITT 

131 


262.00 


10036 


HSMDNCF 


IL-8 


0 


119 


238.00 


10089 


HSLAGICDN 


Lymphocyte activ gene 


0 


71 


142.00 


10060 


HUMTCSM 


RANTES 


0 


23 


46.000 


10003 


HUMMIPIA 


MIP-1 


3 


121 


40.333 


10689 


HSOP 


Osteopontin 


0 


20 


40.000 


11050 


NCY011050 


INCYTE 011050 


0 


17 


34.000 


10937 


HSTNFR 


TNF-alpha 


0 


17 


34.000 


10176 


HSSOD 


Superoxide dismutase 


0 


14 


28.000 


10886 


HSCDW40 


B-cell activ, NGF-relat 


0 


10 


20.000 


10186 


HUMAPR 


Early resp PMA-induc 


0 


9 


18.000 


10967 


HUMGDN 


PN-1, glial-deriv 


0 


9. 


18.000 


11353 


NCY011353 


INCYTE 011353 


0 


8 


16.000 


10298 


NCY010298 


INCYTE 010298 


0 


7 


14.000 


10215 


HUM4C0LA 


Collagenase, type IV 


0 


6 


12.000 


10276 


NCy010276 


INCYTE 010276 


0 


6 


12.000 


10488 


NCy010488 


INCYTE 010488 


0 


6 


12.000 


11138 


NCY011138 


INCYTE 011138 


0 


6 


12.000 


10037 


HUMCAPPRO 


Adenylate cyclase 


1 


10 


10.000 


10840 


HUMADCY 


Adenylate cyclase 


0 


5 


10.000 


10672 


HSCD44E 


Cell adhesion glptn 


0 


5 


10.000 


12837 


HUMCYCLOX 


Cyclooxygenase-2 


0 


5 


10.000 


10001 


NCYO 10001 


INCYTE 010001 


0 


5 


10.000 


10005 


NCY010005 


INCYTE 010005 


0 


5 


10.000 


10294 


NCY010294 


INCYTE 010294 


0 


5 


10.000 


10297 


NCYO 102 9 7 


INCYTE 010297 


0 


5 


10.000 


10403 


NCY010403 


INCYTE 010403 


0 


5 


10.000 


10699 


NCY010699 


INCYTE 010699 


0 


5 


10.000 


10966 


NCy010966 


INCYTE 010966 


0 


5 


10.000 


12092 


NCy012092 


INCYTE 012092 


0 


5 


10.000 


12549 


HSRHOB 


Oncogene rho 


r\ 

Q 


5 


10.000 


10691 


HUMARFIBA 


ADP-ribosylation fctr 


u 


4 


8.000 


12106 


HSADSS 


Adenylosuccinate synthetase 


n 


4 


8.000 


10194 


HSCATHL 


Cathepsin L 


0 


4 


8.000 


10479 


CLMCYCA 


I Cyclin A 


n 

u 


4 


8.000 


10031 


NCY010031 


INCYTE 010031 


n 


A 


8.000 


10203 


NCY010203 


INCYTE 010203 


0 


4 


8.000 


10288 


NCY010288 


INCYTE 010288 


0 


4 


8.000 


10372 


NCY010372 


INCYTE 010372 


0 


4 


8.000 


10471 


NCY010471 


INCYTE 010471 


0 


4 


8.000 


10484 


NCy010484 


INCYTE 010484 


0 


4 


8.000 


10859 


NCY010859 


INCYTE 010859 


0 


4 


8.000 


10890 


NCY010890 


INCYTE 010890 


0 


4 


8.000 


11511 


NCY011511 


INCYTE 011511 


^ 0 


4 


8.000 


11868 


NCY011868 


INCYTE 011868 


0 


4 


8.000 


12820 


NCY012820 


INCYTE 012820 


0 


4 


8.000 


10133 


HSIIRAP 


IL-1 antagonist 


0 


4 


8.000 


10516 


HUMP2A 


Phosphatase, regul 2A 


0 


4 


8.000 


11063 


HUMS 9 4 


TNF-induc response 


0 


4 


8.000 


11140 


HSHB15RNA 


HB15 gene; new Ig 


0 


3 


6.000 


10788 


NCY001713 


INCYTE 001713 


0 


3 


6.000 


10033 


NCY010033 


INCYTE 010033 


0 


3 


6.000 


10035 


NCY010035 


INCYTE 010035 


0 


3 


6.000 


10084 


NCY010084 


INCYTE 010084 


0 


3 


6.000 


10236 


NCY010236 


INCYTE 010236 


0 


3 


6.000 


10383 


NCY010383 


INCYTE 010383 


0 


3 


6.000 
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TABLE 4 Con't 



number 


entry 


s descriptor 


bgfreq 


rfend 


ratio 


10450 


NCy010450 


INCYTE 


010450 


0 


3 


6.000 


10470 


NCy010470 


INCYTE 


010470 


0 


3 


6.000 


10504 


NCy010504 


INCYTE 


010504 


0 


3 


6.000 


10507 


NCY010507 


INCYTE 


010507 


0 


3 


6.000 


10598 


NCY010598 


INCYTE 


010598 


0 


3 


6,000 


10779 


NCY010779 


INCYTE 


010779 


0 


3 


6.000 


10909 


NCy010909 


INCYTE 


010909 


0 


3 


6.000 


10976 


NCy010976 


INCYTE 


010976 


0 


3 


6.000 


10985 


NCY010985 


INCYTE 


010985 


0 


3 


6.000 


11052 


NCY011052 


INCYTE 


011052 


0 


3 


6.000 


11068 


NCY011068 


INCYTE 


011068 


0 


3 


6.000 


11134 


NCy011134 


INCYTE 


011134 


0 


3 


6.000 


11136 


NCY011136 


INCYTE 


011136 


0 


3 


6.000 


11191 


NCY011191 


INCYTE 


011191 


0 


3 


6.000 


11219 


NCY011219 


INCYTE 


011219 


0 


3 


6.000 


11386 


NCY011386 


INCYTE 


011386 


0 


3 


6.000 


11403 


NCy011403 


INCYTE 


011403 


0 


3 


6.000 


11460 


NCY011460 


INCYTE 


011460 


0 


3 


6.000 


11618 


NCy011618 


INCYTE 


011618 


0 


3 


6.000 


11686 


NCY011686 


INCYTE 


011686 


0 


3 


6.000 


12021 


NCY012021 


INCYTE 


012021 


0 


3 


6.000 


12025 


NCY012025 


INCYTE 


012025 


0 


3 


6.000 


12320 


NCy012320 


INCYTE 


012320 


0 


3 


6.000 


12330 


NCY012330 


INCYTE 


012330 


0 


3 


6.000 


12853 


NCy012853 


INCYTE 


012853 


0 


3 


6.000 


14386 


NCY014386 


INCYTE 


014386 


0 


3 


6.000 


14391 


NCy014391 


INCYTE 


014391 


0 


3 


6.000 
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TABLE 5 



* Maater menu for SOSTRACTION output 
SET TW^OPP 

EST SAPETV OFF 

SET EXACT CN • 
£&T TlfPEAKEAD TO 0 
OiEAR • • 

SEX DEVICE TO SGREEt? 

i;ss-"SinartGiiysFokBftSE+/Uacs£ox files i Clones. dbf* 
qo TOP ' 

STQE^ NUMBSR TO^INXTIA'TO 
GO BQOTQ M 

STOBB NUMBER TO'lSBHDttTE 
STQRB • '* TO Target! 

STORE! * ' TO Target2 

STORE * . • TO Targets 

STORE.* ' TO Cibjectl 

fiTC^ * ■ 'TO Cbject2 

STORE ' • TO Object3 

STORE 0 TO AMAL 
STORE 0 TO EMATCH 
STORE 0 TO UMATCH 
STORE 0 TO GMATCK 
STORE 0 TO IMATQi 
STORE 0 TO J?TP 
STORE 1 TO BAZIr 
DO mUfE .T> 

* 'Progron. i "Subtreictlon 2,fntt 
Eiato.,,,t lom/94 . 

* Version. I FoxBASS4-/Mac, revision 1.10 

* Notes.. ^.t Foroab file Subtraction 2 
.* , ■ • 

SCREEN 1 TTTE 0 KEADI23G *Screen 1* AT 40,2 SIZE 286,492 PIXELS PONT 'Geneva*/9 COLOR 0,0,0, 
a PIXELS 75,120 TO 178,241 SraS 3871 CCttjOR 0,0,-1,24610.-1,8947 

0 PIXELS 27,194 SA^ -Subtraction Menu" STKiE 65536 PONT »Cenevd',274 CCl/DR 0,0,-1^-1,-1,-1 
0 PIXELS 117,126 GET BMATGH STiTLE 65536 FONT "ChicagoM2 PICTURE -6*0 Bcact « SI2E'-aS;62 'CO 
©'PIXELS 135,126 GET HMATCK STWiE 65536 PONT •Chicago', 12 .PICTURE 'e^C Homologous SIZE ;LS,1 
e PIXELS 1S3,'126 GET CftttTCH sm£ 65536 EWT *ChicagoM2 PICTURE •9*C Other epc" SIZE 15,84 
e PIXELS 90,152 SAY "M^tchcfli-. STHiE 65536 PtOT «CenevaM2 COLOR 0,0,rlr-l. -1#-1 . 
« PIXELS 171,126 GET Imatch STXLB 65536 EWT "ChicagoMa PICTTURB •^•C Incyte* SIZE 15, 65 CO 
e PIXELS 252,137 GET initiate 5TVLB 0 FCNT •GenevaM2 SIZE 15,70 COLOR 0,0,»l,-l,-l,-l 
d PIXELS 252,236 GET terminate STVLE 0 FONT 'Geneva", 12 SIZE 15,70 COLOR 0,0,-1,-1,-1,-1 

8 PIXELS 252,35 SAY "Include clonee-^ STlflS 65536 FOMT 'Gensva", 12 COLOR 0/0^-1, -1, -1, -1 
0 PIXELS- 252,215 SAY ■->" STYLE' 65536 PONT "Geneva", 14 COLOR 0,0,-1,-1.-1,-1 . 

*e PIXELS 198,126 GET PTF STYLE 65536 FCWT •ChidagoM2 PICTURE -8*C .Print CO file" SIZE 15', 9 
e' PIXELS 90,9 TO 1Q1,109 STYI^ 3871 COLOR 0,0,-1, -25600. -1,-1 
0 PIXELS 90,288 TO*181|397 STiTLE 3871 COLOR 0,0,-1,-25600,-1,-1 

d PIXELS 81.296 SAY 'Background: * STffiB 65S36 FCOT »Geneva»,270 COLOR 0,0,-1,-1,-1,-1. . 

e PIXELS 45,135 GET ANAL STVLE 65536 FOOT 'Chicago" ,.12 PICTURE "e^R Overall I Fvxacticn" SIZE 4 

9 PIXELS 81, i6 GAY "Target:" STYLE 65536 FONT "Geneva*, 270 COLOR 0,0,-1,-1,-1,-1 

^ PIXELS 108,20 GET tnrgetl OTHiE 0 PCMT ■Geneva"',9 SIZE 12,79 COLOR 0,0,rl, -1, -1,-1 
•0 PIXELS 135,20 GET targets STYLE 0 PONT "Geneva", 9 SIZE 12,79 COLOR 0,0,-1,-1,-1,-1 
.8 PIXELS 162,20 GET targetS STYLE 0 FOOT "Oeneva'^9 SIZE 12,79 COLOR 0,0,-1,-1,-1,-1 
e PIXELS 108,299 GET objectl STYLE 0 FONT 'Geneva", 9 SIZE 12,79 COLOR 0,0,-1,-1,-1,-1 
« PIXELS 135,299 (SET otoject2 STYLE 0. FONT "Geneva", 9 SIZE 12,79 COLOR 0,0,-1,-1,-1,-1 
6 PIXELS 162,299 GET 6bject3 STYLE 0 FONT "Geneva", 9 SIZE 12,79 COLOR 0,0,-1,-1', -1,-1 
•8 PIXELS 276,324'GET Bail STYLE 6S536 FOOT "Chicago", 12 PICTURE "8*R Run;Bail out" SIZE 4112 
« ^ . 

« EOFs Subtraction. 2. £mt 
READ • 

IF Bail82 

CLEAR 

CLOSE DATABASES 

USE "atartGuy:FcKBASB•f/^tacl£ox files! Clones. db£" 
.SET SAFCTT ON 
SCHEE&9.1 OFF 
RETURIf 
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EMDIP 

VAL(5rs(2) ) 10 STARTIM2 
STORE UPPER (Target!). TO Targetl 
BTORB. UPPER (Targets) TO Target2 
STOiRB UPPER'(Targ t3) TO Target3 . 
STORE VPPBR(ObJactl) TO ObjecCl 
STORE X7FPER (Ob ject2) TO'0bjeot2 
STORE OTPER(0bjcct3) TO Objecb3 
clear 

SET TfUX. Q^7 

GIiP s TEEtMINATE-ZKITZATB^l 
GO 327XTZATB 

COPY ^^E»^ gap fields NtMER,library,D,P,Z,R,Emy,S,I3BSCRrPTOR,8TART,RFEMD,l TO TSmm 
USS TSIFNUM 
OODNT TO TOT 

COPlf TO TEJiPRED FOR D=»E* .OR.D='0* .OR.D='H' •OR,D='N' .OR.Dn'I' 
USB THMPRED 

IF OndtchsO KnacchsO CHnatchsO .A^. iMKiCKsO 

COPY- TO. TEMPDESIG 

ELSE 

COPY STRUCTURE TO TEMPDESIG 
USE 'TOMFDBSIG 
IF BcatchtBl 

APPEND FROM 1T2{E£?UM FOR 
EHDIF 

I P'Hm aech=l 

APPEND FROM TEMPKUM FOR D=*H' 
IWDIF 

IF Gmatch?!^ 

AFFENC FRfttl' TEMHNUM FOR D= *0 * 
ENDIF 

IP IrrAtdisl 

APPa© FROW TEMENUM FOR 13= 'I» .0R.D='X' 
♦,0R,Da»M' 

.ENDIF 
ENDZP 

COONT TO STARTOT 

COPY STRUCTURE TO TEMPLIB 
-USE TEHPTiIB ... 

APPEND FRQH TEMPDESIG FOR libraryuUPPER (target! > 

IP targehSo • . • . 

APPEND^ FROM TEMPDESIO library=UPyER( targets ) 
END2F ' 
IP target3<y' . » . 
APPEND FROK. TEMPDESIG FOR library»DPPER (targets ) 
IK DIF 
COONT TO ASZAI/roT* 



USE TQIPDESIG 

copy S'i'RUClURE TO TCKFSUB 

USE TEKPSUB 

APPEND Fmi taCPDESIO FOR libraiy=UPPER{Objectl) 
IP taapgehSo* ' 

APPBMD FROM TEMPDESIG FOR. librarysUPFER(Cbject2) 
ENDIF 

IP targetSo' 

* APPEND FRD» TEMPDESIG FOR Idhrary=UPPER(0b!jact3) 

ENDIP 
COUN T TO SDBTRACT07 
SST TALK OFF 



^ CQKPHBSSlOt? SUBROUTINE A 
? 'CCWPRESSINO'OUERV LIBRARY* 
USB ITMPLIB 
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SORT CH-EZn^Y^KOHBER OX) LISS0IV7. 
USE LZBSORT 
COUNT TO 

WSPIACB ALL BPEND WITH 1 
K^m a 1 ' 

DO miLB swa-o boll 

IP HARKl >s IDGH^^B 
PACK 

COUNT TO JUDNIQUE 

LOOP 

ENDIF 
GO HftKKl 
WP ^ X 

STORE aroy to testa 

8T0HE D TO DSSIGA . 
SM - 0 • • 

*D0 WHILE SW=0 .TEST 
SKIP 

8T0R£ SCTHlSf TO THSl'B 
STOPS D TO tHSXGB 

IF TEffTA = TESTS. AND.t3SSIGAj2l3ESIGB 

DUP s CUP<h1 
LOOP 
ENDIF 
GO'UAfiKl 

TtSnAGE RPEND HJP 
HARKl - H^»K1+C0P 

LOOP 

E(7DD0.TESr 
LOOP 

ENDIX) KOLL 

SORT CN PPEl^/D,KaMBfiR TO TQIPTIARSORT. 
USE '/hlMPrARSORT 

^REPLACE ALL START TOTK R£m)/IOGE3^*10000 
COQfTP TO TSMfTARCO 



* ccmpr£:ssio(¥ subrdutxke b 

? *cot«frsssq^ target library' 

USE ,TEMPSUB . 

SORT ON ENTTOf, NUMBER TD'SUBSORT 
USE SUBSORT 

COUNT TO S0B6ENE 

RBPL20 ALL RFEND KTZK 1 

MARKl B 1 

SW2bO 

• DO WHILE Sff2=0 ROLL 

IF KARKl >« SUEGE3^ 
PACK 

COUN'T TO BUNI^ 

SH2sl 

LOOP 

ENDIF 
GO KARKX • 
DUP - 1 

STORE, ENTRY TO TESIA 

STORE D TO DESIQA 

6W B 0 , 

DO WHILE eWsO TEST 

5KZP 

STORE E^TIRY TO. TESTS 
STORE D TO DBSIGB 

IF TESTA = TESTS .AND* DBSIGArDESIGB 
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LOO? 
aiDX? - 
GO'MAfiKl 

REFIiACB I^FESTD WZTO CUP 
LOOP 

EKDCO xbssT 

|jOOP ; 

ENDDO RDUi 

SORT GN RFEKD/D, NUMBER TO 

*REPU^ AH/ fiTART Wlffi flF£ND/ZDSENE*10000 
COUMT 70 TEMP50ECO 



7 'fi lCTRA Cl'mU LIBRARIES* 

1751! fi DSra ACgIO^? 

COfY TO CRUNCHER 

SELECT 2 

OSB 7S21PSOBSQRT 

BRTiFTT 1* 

tJSB CRUMCKHR 

APPEKD FROU TEKPI!ARSG!RT 

CbUKT TO BAILOUT 

HAIsK ^0 

DOWHItS .T.. 

HARK B MARK+1 
IF MARJOBAILOUT 
2XIT 

•GO MARK 

5TC©E_EMnQf TO SCANNER 
SSLE3CT 2 

I^CATE. FOR £2fnCf^SCA^INSR 
IP FOUND!) 

STORE RFEND TO BITl 
STORE RpWD TO BIT2 

STOR E 1/2 TO Bin 
STgE 0 TO BUS 

SKTJCT I 

KHFLACE BGFRBQ WITH BZT2 
RBPLACE ACTUAL WTOI BXTl 
LOOP 

SEtflCT 1 , ' 
RSFLACS ALL RATIO WITH RFSND/ACtOAL 
? >DOZKQ PINAL SORT £7 RATIO* 
SORT ON RATZO/C,BGFRSQ/D, DESCRIPTOR TO PIKAL 
'USB FI>1AL 



BBt balk off 
SO CASE. 
CASE PTPs;0* ' 
SET DE7ICB TO PRINT 
ON 

EJECT 
CASS PT^sl 

SET ALTERNATE 70 "Adenoid .Patent FiguresxSUbtraction.txt' 
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6 BT A LTERtPVTB CN 

fiTORE WiL(SY6(2) )* TO FINTIME 

XF FINTZKE<5TAPTZHE 

STORB' PX!mMB+86400 TO .PH^BS 

BNDIP 

ff^ORE FINTIMB - STARTUffi.TO C30MPSEX3 
STORE CGKP5SC/60 TO COUIUZN 

■****♦**♦*♦«♦*•***'*»*# 

•BET MW^IW TO 10 

81,1 BAY "Librazy Subtraction Analysis- 6Tn£ 65536 FONT ■Gensva\274 COLOR 0|0,0/-1,-1| 

? 

? 

7 

7 

77 I • . 
?? TIMEO 

? <Clone ziuxnbers * 
■77:CTR(nTrTIATBr5,0) 
.?? through » • * 
?? erR(TERMINATB,6,0) 
7 •Libraries I * 
7 Targctl 
IP .Target2<>' 

•• t 
7? Target2 
BNDI? 

IF !rarget3<>' 
?? ' * 
77 isarget3 
fiDlDI? 

7 'Subtraccix^g; 

7 Objectl 

IF-0bj€ct2<>' 

??•-•,.' 

77 Ctejeot2 

ENDIF 

IF Qbject3<>' 
7? ' 
77 Objects 

ENDI? . . - 

•7 * Designations r .* 

IF EraatchaO .M©. l&iatchs=0 .AND. Cniatch=0 ,AND. IMATCH=0 

7? 'JdV 

EMDI? 

IF Etetchal 
?? 'Ejcaet,' 

IF Hmatchsi 
77 'Human, ' 

'IF canmtc^sl 
77 'Other ep,' 
ENDIF 

IF Imatcbol 
7.7 'INCTrS* 
ES2DI7 

7 'Sorted by ABUNDANCE'- 

ENDI?. 

IF ANALs2 

7 'Arranged Tc^ FUNCncN* 
ENDI? 
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7 '?o&al clones represen&edt ' . 
7? 67R{WV,S,C} 
7 'Total -clones analysed: ' 
77 fi^(STftR!IOT,5|0) 
? •Total, corputation. time; 
.??STR(COM5WIN,5,2) ' 
77 * xsinutea' * 
? ' ^ 

7' *d B designation £ a disbrifaution z = location, r s tunction s a species i = inte 

, * 

SCRESN 1 TYPE 0 READING "Screen 1" AT 40,2 'BIZZ 286|4d2 PIXELS FOOT »GesevaS9 COLOK 0,0,0, 

DO CASB . 

CASE ANALsl 

?? fiTR{AlKrQlJE,4,0} 

* genes, for a total o£ * - 
..7? STR(ANai/IOT,A,0) ' 
?7 ' blones' 

? . • . • . . - 

BCSSm 1 'EVPS 0 HEADBiO 'Screen 1" AT 40,2 SIZE 286,492 PIXELS FO^r^ "Geneva",? COLOR 0,0,0, 
liflt OFF fields nurnber',D,F,Z,HjEt7Z7^,S,DE6CRIPTOA,£GFR^ 
SET PRINT OPP' 

CLOSE mrhSAaasB , 

•USB/GirartCuyiF0XBASE-i-/MaG!£Gx files: clones, db£* 

CA6E.;^1C^3 

arrange/ function 
SS7P PRINT'CfT 
SfifP KEADOG QtV 

5CREEK 1 TlfP&.O HEADING 'Screen 1* AT '40,2 SIZE 286,492 PIXELS 'FONT 'Helvetica', 268 COLOR 0 

? ' * ' • 

'7 * BIKDING FRGTSTHS* 

7 * . * . 

SCREEl)'! T!fPS 0 HEADING 'Screen 1? AT 40',2 SIZS 286,492 PIXELS 7QHT 'Helvetica ',2 65 COLOR 0 
7 ■ surface molecules and receptors i ' 

SCREEN 1 0 HBASGCNG 'Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT 'Geneva", 7 COLOR 0,0,0, 

list OPP fields number, D;p;Z,R,E?miV,S,ISSCRIPTOR,BGPRSQ/RPEND, RATIO FOR R='B' 

' SCREEN -1 T^E 0 HBADIKG 'Screen 1' ' AT 40,2 SIZE 286,492 Pims .fOiNT .'Helvetica' ,265 COLOR 0 
? ' Calcium-binding proteins :♦ 

SCREEN 1 WPE 0 HERDING 'iscreen 1" AT 40.2 SIZE 286,492 ?imJS FONT 'Geneva*,? COLOR 0,0,0, 
list OFF fields nurtlber,D,P,Z,R,ENITO,S,IIESCRITOR,BGFREQ,RFHia3,RATI0^ FOR Rs'C 

screen' 1 TYPE 0 HEADING 'Screen 1* AT 40,2 SIZE 266,492 PIXELS FONT 'Helvetica* r 265 COLOR 0 
7 *Liganda 'end effectors:! 

SCREEN 1 infPE 0 HEADING 'Screen 1' AT 40,2 SIZE 286,4^2 PIXELS FONT 'Geneva',? COLOR 0,0,0, 
list OFF fields nuniber (D^F,Z,R, ENTRY, S, DESCRIPTOR, BGFREQ,RFEKD^ RATIO, I FOR Rs'S'* 

SCREEN 1 TYPE 0 HEADING 'Screen 1* AT 40,2 SIZE 286,492 PIXELS FONT 'Helvetica ',2 65 COLOR 0 
7 'jOther binding proteinei ' 

SCREEN 1 TVPE-O HEADING 'Screen V AT*40,2 SIZE 286,492 PIXELS FCNT "Geneva', 7 COLOR 0,0,0, 
list OFF £ieldfl*nurobBr,D,F,Z,R,ENTRY,S,nESCRIPTOR,EGFRSQ,RrEND,RATIO,I FOR R=*I' ' 
7 . . . 

SCREEN X TYPE 0 HEADING 'Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT 'Helvetica ',2 68 COLOR 0 
7 ' . . , ONCOGaJES' 

7 . ' 

SCREEN 1 TYPE 0 HEADING 'Screen 1* AT 40,2 SIZE 286,492 PIXELS FONT 'Helvetica ',2 65 COLOR 0 
7 'General oncogeneei < ^ ^ 

SCREEN 1 OYPS 0, HEADING '.Screen 1' AT .40,2 SIZE 286,492 PIXELS .FONT 'Geneva',? COLOR 0,0,0, 
list OPP fields cucnber,D,P,2.R,ENtRY,S,DESCRIPT0R,BGFREQ,RFEMD,RATI0,I FOR Ra'O* 

SCREEN 1 T^E 0 HEADING 'Screen 1' AT 40,2 SIZE 286,492 PIXELS FOTT 'Helvetica ',2 65 COLOR 0 
7 'GTP-binding proteins i ' • . 

SCREEll X Tn'B 0 HEADING 'Screen 1' AT 40,2 SIZE 286,492 Pi;^ FONT 'Geneva*,? COLOR 0,0,0, 
list OFF fields nuniber,D,P,2,R,ENTRY,9,DESCRIPT0R,BSPREQ,RPBND, RATIO,! FOR Ra'O* 
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SCRSp 1 TVPE 0 HBAPINO 'Scraeii 1- AT 40,2 SIZE 286,492 PtXELS FCJMT -Helvetica- ,365 COLOR 0 

? 'Viral elementBi' ^ -TtT • 

SCRSSN X TSrPfi 0 HSADIN5 -Screen 1- AT 40,2 SIZE 296,493 PIXELS FWT '<^SS^ ,1 cQiR t®0 0 

list OFF fields nutnb€r,D/P,Z,R,EimtY,S, DESCRIPTOR, BGFRBQ,RFEiro,RA^ FOR Rs»V» ' ' ' 

SC31SEN 1 TYPE 0 HEADIN3 ■Screen !■ AT 40,2 SIZE 2fi6*,492 PIXELS FOOT 'Helvetica', 2 55 OCSLOR n 
7 'Kiraaea and Phosphatafles I ' . w«wn » 

SGIEEN 1 TWB 0 HEADING "Screen 1" AT 40,2 SIZE 286^432 PIXELS FOOT -Geneva', 7 CXJLOR 0,0 0 
list OFP fields n\OTber,D,F,Z,R,OT!RY,S#I3ESCRlPT0R,BCFRBQ,RFEND,RATI0,I FOR Ra»y»' ' ' 

SCREEN I'TYPE 0 HEADING "Screen 1' AT 40,2 SIZE 286,492 PIXELS PCOT "Helvetica'* 265 COLOR 0 
7 < Tumor-related antigens I ' 

SCREgir 1 TYPE 0 HEADING ."Screen 1" AT 40,2 SIZE 286,492 PIXELS PONT "Geneva",? COLOR 0,0,0 
list OFF 'fields nuinber, D/P, 2, R, ENTRY, S# DESCRIPTOR, BOFREQ^RTEMD, RATIO, I FOR Ra'A' ' 
7. 

SCREEN 1 OYPE 0 HEADTOJ 'Screen 1" AT 40,2 SIZE 286,492 PIXELS FCMT 'Helvetica", 268 COLOR 0 
7' * PROOEIN SWIHETIC MACHINERY PROTEINS' \ " 

7 . . , . 

SCREEN 1 TVPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica" , 265 COLOR 0 
7 •Transcription ai^ Nucleic Acid-binding proteins i' 

SCREEN 1 TlfPE 0 HEADING "Screen 1' AT 40,2 SIZE 286^492 PIXELS FONT "Geneva",? COLOR 0,0,0 
list OFF fields number,D,F,Z,R,amr,SiDESCRIPT0R,BC5FRBa;RFEND,RATI0,I FOR R='0' . ' ' ' 

SCREEN 1 TlfPE 0 HEADING 'Screen 1" AT 40,2 SIZE 286,492 PIXELS ' FONT -Helvetica" , 265 COLOR 0 
? 'Translation: ' ^ * 

SCREEN 1 TWE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS PONT "Geneva', 7 COLOR d,0 0 * 

list 'OFF fields in2niber,D,P,Z,R>Q?K«.,S, DESCRIPTOR, EGFRBQ,RF FOR R='T' ' ' 

SCREEN 1 T£PE 0 HEADUTO "Screen 1" AT' 40,2 SIZE 286,492 PIXELS PONT "Helvetica" ,265 COLOR 0 
? 'Ril »scnial proteins: ' . • * 

SCREEN 1 TVPE 0 HEADH^ "Screen 1" AT 40,2 SIZE 286,492 PIXELS FOOT •G€neva\7 COLOR 0,0 0 ' 
list OFT fields nun'l:jer,D;r,Z,R,EWIW,S,DESCRIPT0R,BGFR2Q,REENT),RAT FOR Ris'R' ' ' 

SCREEN 1 TlfPE 0 HEADING ."Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica" /265 COLOR 0 
7 'Protein processiiig: ' ' . . 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "O'eneva"'.? COLOR. 0,0,0, 
list OFF fields nurnber,D.P,Z,R#E2TOir,S,CBSCRlPT0R,BGFREQ,RFEND,RATIO,l FOR R«»L» 
7 

SCREai 1 Tn^ 0 HEADING "Screen 1' AT 40,.2 SIZE 286,492 PiXELS . POmf 'Helvetica* ,268 COLOR 0 

7 • ENZWdES' 
? 

SCREEN 1 TyPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS POWT "Helvetica ", 265 COLOR 0 
?• 'Ferrcproteinai • 

SCREEN 1 TyPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT 'Geneva',? COLOR 0.0,0, 
list OP? fields nuJttber,D,F,2,R,2mOT,S,DESCRIPlCR,BGF^ 

SCREEN 1 TVPE 0 HEADDIG "Screen* 1* AT 40,2 SIZE 286,492 PIXELS TCm -Helvetica", 265 COLOR 0 
7 _* Pro teases and inhibitora ; ' - • ' •.• • '^'^ 

SCREEN 1 TYPE 0 HEADING ^Scte€sx 1" AT 40,2 SIZE 286,492 PIXELS FONT •Geneva',? CCXiOR 0,0,0, 
list OFF fields number ;D,P,Z,R, ENTRY, S,D3SCRISTOR,BGFREQ,R?END, RATIO. I FOR R='P' 

SCREEN 1 TYPE 0 HEADING "Screen 1' AT 40,2 S12S 285,492 PIXEXS PONT "Helvetica", 265 COLOR 0 
7 'Oxidative phosphorylation:.' . . 

SCREEN 1 TYPE 0 HElADINt} "Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT •Geneva', 7 COLOR 0,0,0 
list OFF fields nuzriber,D,F,Z,R,EHrRY,S, DESCRIPTOR, flGFREQ,RPEND, RATIO,! FOR R='Z* ' 

SCREEN 1 TYPE 0 HEADING' "Sdreen 1" AT 40,2 SIZE 286,492 PIXELS FONT "Helvetica", 265 COLOR 0 
7 'Sugar jnetabolism! ' ' 

SCREEN 1 TYPE 0 HEADING "Screen 1' AT 40,2 SIZE 296,492 PIXELS FONT »Geneva-,7 COLOR 0,0,0, 
list OFF fields mimber,D,F,Z,R,aTrRY,S,DeSCRlPTOR,B<aF'REQ,RFQro,RATIO,I FOR Rb'Q' 

SCREEN'I type 0 KEADIKG 'Screen 1" AT 40,2 SIZE 286,492 PIXELS PCOT 'Helvetica ',265 COLOR 0 
7 'Amino acid inetaboliszn: * 

SCRE©I 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 236,492 PIXELS FONT "Geneva",? COLOR 0,0,0/ 
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list OPP fields nurtd&er,D,P,Z,R,EWnW,B,DESCTOPT0R,BOPRBQ FOR Ra'W 

SCREEN 1 TWE ©.HEADING "Screen f AT 40,2 $I2B 2H,<92 PIX2LS FCNT •£?^?icl%9S SSjOB. 0 
? *27acl€ic acid sietateliEm: ** ' * * . 

SCSWl I.IYPB O'KEM^BKa "Screen '1" A'P 40/3 SIZE 286,492 PIXELS PCNT 'Geneva', 7 COLOR 0,0, oi 
lisb, OFP fields nuinber,D,Ff Z,K,ENTRy#5,CESCRXPTOK,BOFB£!Q,RFQ7D,RATZO,'X FOR Rb*N' 

'SCR££2T'l TYPE 0 HEAD2N0 "Screen 1" %T 40,2 SIZE 286^492 Pzms' FOtTT ''HelveticaS265 COljQR 0 
? 'Lipid znetaboLism:' 

SCRESCf 1 TVPB 0 HSADIKO "Screen 1' kt 40,2 SIZE) 266^492 PIXELS rCNT 'Geneva", 7 COLOR 0,0,0^ 
list OFF fields nuiiiber,D,F,Z,K,E(miy»9fIESa^PT0R,&GPREQ,R?END,RAT^^ FOR Rb*W* 

eCREIM 1 IVPE 0 HE&DIN3 'Screen- 1' AT 40,2 SIZE 296,402 PIXEl^S FCNT •Helv«ticaS265 COLOR 0 
7 'Other enzynea: * 

SCREEN 1 Vn>E 0 HEADING 'Screen 1« AT 40,2 SIZE 286,492 PIXELS 7Cm 'Geneva ',7* COLOR 0,0,0, 
liat OFF fields nuniber, D',P,Z,R,an!R5r,fl, DESCRIPTOR, BGFRBQ,RPEND, RATIO, I FOR R='E' 

7 . . • • . 

SCRBE^r 1 T^ 0 HEM>INS 'Screen 1" AT 40,2 SIZE 286,492 PIXELS TCtTT 'Helvetica', 2 68 COLOR 0 

? . • * ■ ' , 

7 * HISCEUANEOUS CATEGORIES' 

7 

SCREQI a TSfPE 0 HEADING 'Screen 1» AT 40,2 SIZE 266,492 PIXELS PONT 'Helvetica' , 265 COLOR 0 
7 'Stress redponsei* ' . . - . - ' . 

' SCREEN l TXTS 0 HEAD1N3 'Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT 'Geneva',? COLOR 0,0,0, 
liat OFF fields nun»ber,D,FvZ,R,EmY,S,DBSCRl?TOR,BGFRM,RFEND,RAT10,l POR R=*H' 

SCREEN 1 1YPE 0 HEADINB 'Screen 1' AT 40,2 SIZE 286,492' PIXELS FONT 'Helvetica', 265 CCLOR'O 
7 'Structiiral: ' * . . . 

SCREEN 1 TlfPE 0 HEAD1NC3 'Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT 'Geneva", 7 COLOR 0,0,0. 
list OPF fields, nintib€r,D,r,2,R,ENmY,-S,t)SSCRlPT0R,BG^^ R='K' 

SCREEN 1 TiVB 0 HEADm3 'Screen 1' AT 40;2 SIZE 286,492 PIXELS FONT "Helve tics ',2 65 COLOR *0 
7 'Other clones:' ' • - • 

SCREEN 1 TSrPE 0 .HEADIN3 "Screen 1' *AT 40,2 SIZE 286,492 PIXELS ' FCNT* -Geneva", 7. COLOR 0,0.0. 
list OFF fields nu!riber,D,F, Z,R,aWRy,S,teSCRlPTOR,BGPREO,RPEiffi,l«TIO,I FOR R=»X' 

SCREEN 1 TYPE 0 HEADING 'Screen 1' AT 40,2 SIZE 286,492 PIXELS FONT 'Helvetica', 2 65 COLOR 0 
7 'Clones' of un3cnown functions' * . . • 

SCREEN 1 TSfPE 0 HEADIMG 'Screen 1« AT 40,2 SIZE 286,492 PIXELS FOOT 'Geneva',? COXjOR 'O/OiO, 

list OPP fields nunber,D,F,Z,R,HNIOT,S,DBSCRIPTOR,BGFREQ,RF£»D, RATIO,! FOR R«'U* 

ENDCASE 

DO 'Teat print .prff" 

SST PRIOT OFF 

SET, DEVICE TO SCREEN 

CLOSE DATABASES 

ERASE TEMPLIB.DBF 

ERASE THMPNUKiDBF 

E?^E TEMPDSSI6.DB? 

SET MARGIN TO 0 * 

CLEAR 

LOOP 

ENDDO 
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*>torthezii (single), version 11-25-94 
close eSatebae s 
S&P OFF 
SET PRINT OFF' 
SET BXACT OFF 

STORE » TO Ecbject 

STORE • ^ •TO JDobject 

STORE 0 TO Nunb. 
STORE 0 'TO ZOg 
STORS 1 TO Ball 
DOWHU^ ,T, 
Program- 1 Nbrthem (flingle) .fint 

♦ Date....: 8/ 8/94 

• Versiorv.i .Pc»fflASB+/Mac/ r^slon 1.10 

* Notes.'/... I .Format file North6in (single) 

SCREEN 1 OWE 0 HEADING •Screen l- Ar'40,2 SIZE 286,492 PIXELS TOOT "Geneva-. 12 color *o a n 
& PagLS 15,81 TO 46,397 STO^ 28447 OOIOR 0,0,-1,-25600; -l!^ • 
" " " " ' -1 

COLOR 0,0,0,-1,-1,-1 
- 142 COLOR 0,0,0,-1,-1,-1 
COtXJR Oi 0,0, -1,-1,-1 . 
241 COLOR 0,0,0,-1,-1,-1 





♦'EOPi Northern (single) .fiht 
READ 

IP Bail=2 
Ct£AR . 
screen 1 off 
'REIURN 
EKDH' 

USB "SrnortGuyiF<»iftfiE+/Mac:Pox ftleatLoolcup,^^ 
SE?r IftLX'CN 

IP Bobjeato' 

STORE UPPE R (Eot^ect) to Eobject 
SET SAFETY OFF 

SGHT .O N En try TO "Loolcup entry.dbf • 

SET SAFET? CN . • 

USE "Lookup entry.abf* 

ICCATE FOR LookBEobject 

*IP ..NOP.FOUNDO ' 

dSAR 

LOO? 

BROE^SE 

STORE Entry TO Searchval- 

CLOSE DATABASES 

B RASS ."Lookup entry.dbf" 

£HDt? 

•IP Dbbjecto' • 
SET B)CACT OFF 
sen? SAFETV OFF 

SORT descriptor TO "Loo)cup' descriptor. dbf" 
SET SAFETY On 

USE "LooJcup descsriptor.dbf * 

tOCATE FOR OTPER(raiM(descriptor))=DPPER(raiM(Dc»bJeetn 

IP .NDT.FOUNDO 

CUIAR 



5 6 



wo 95/20681 



PCTAJS9S/01160 



LOOP 
. BMDIP 
BRQWSB 

STORE Catry TO Searchval 

CXOSB XATkBASBS * ' 

ERASE "Loobjp dsscriptor.dbf" 

SET EXftCT 0^ 

SNDXF • 

IF NuDPboO 

USS *SjTartGuytFo}^ASEi>/Kac;Fox £ile& : clones. dbf^ 

GO NUmb 

BROWSE 

•STORE Entxy TO Searchv^l 

? 'Northern analyaifl for erttry • 

?? Seafctayal 

? . • .* 

? 'S^cer y to proceed' 

WWCT TO CK • 

C T iP AR 

IP OT?ER(OK)o»y< 
screen I off 
HETURH 
ENDIF 

^' CQK?RESSiCN'SCBROafI^lB FOR Li]bra:^idb£ 

7 •Carpreasiug the I-ibraries file now;-..'. 

USE "StnartGuy:FoxBASE-»'/Mac:Fox files: libraries. dbC" 

SET SAPEW • . * » 

SOR^ ON llbi^ary^TO "CanipreBsed libraries. dbf" 

* FOR ente r€d>0 ' 
SET SAFETY ON 

USE •Conpressed libraries-. dbf 

DELETE FOR entereds'O 

PACK 

COUNT TO TOT 
HARKl B I 
SW3=0 . 

00 WHILE SW2=0 ROIiL 
•IP MaR!{l >« TOT 
• PACK . * 
BW2=1 
LOOP 
WDIF 

GO MARKl . 

' STORE library TO TESTA 
'SKIP . 

STORE Libra ry TO tsstb 

IF ^STA s Tfi^s'i'is 
EMDIF 

MAKKl ? MARKlVl 
LOOP ' 
H^ODO ROUi 

* Northern analysis 
CLEAR 

7 * Doing the northern new, . » 
SET TAUC CN 

USE ■smartGuyiPoscEASE+ZMactPox f ilesi clones .dbf* 
SET SAFETY OFF 

copy TO "HitB.dbf FOR entryosearcihyal 
SET SAFET/ CN 
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* MASOER ANRLVSrS 3j VERSION 12-9-94 

* Master menu for analyaia output 
GLOSS DATABASES 

SET TALK OFP 
SOT SAPETif OFF 
CLEAR 

SET DEVICE TO SCRE5W 

SET CEFAUI/T TO "SmartGuytFOKBASEf/Mac : fox file&iOutput progra-Tis i " 
USE "SmartGuytFQXSASE+/Mac:fox files : Clones. dbf 
GO TOP 

STORENtJMBSR TO INITIAT3 
GO BOTTOM 

STORE NUMBER TO TERMINATE 
STORE 0 TO ENTIRE 
STORE 0 TO CONDENT 
STORE 0 TO ANAL 
STORE 0 TO EMATCH 
STORE 0 TO HMATCH 
STORE 0 TO OMATCH . 
STORE 0 TO IMATCH 
STORE 0 TO XMATCK 
STORE 0 TO PRINTON 
STORE 0 TO PTP 
DO WHILE .T. 

* Program.: tester analysis. £mt 

* Date,..,: 12/ 9/9f4 

* Version.: FoxBASEf/Mac, revision 1.10 

* Notes....} Pozmat file Master analysis 

* ■ 

SCREEN X TOTE 0 HSADINQ 'Screen I" AT 40,2 SIZE 286,492 PIXELS PONT •Geneva%9 COLOR 0,0, Oi 
Q PIXELS 39,255 TO 277,430 STYLE 28447 COLOR 0,0,-1,-25600,-1.-1 
6 PIXELS 75,120 TO 178,241 STYLE 3871 COLOR 0,0,-1,-25600,-1,-1 

@ PIXELS 27,98 SAY "Customized Output Menu" STYLE 65536 FCOT "GenevaS 274 COLOR 0,0,-1,-1,-1 
0 PIXELS 45,54 GET conden STYLE 65536 FO^IT ■ChicagoM2 PICTURE "@*C Condensed format- SIZE 
@ PIXELS 54,261 GET anal STYLE 65535 FONT •ChicagoM2 PICTURE »(a*RV Sort /number; Sort /entry i 
@ PIXELS 117,125 GET EMATCH STYLE 65536 FONT •'ChicagoM2 PICIUSE ''S*C Exact " SIZE 15,62 CO 
Q PIXELS 135,126 GET HMATCH STYLE 65536 FONT •ChicagoM2 PICTURE "0*0 Homologoufi" SIZE 15,1 
@*P1XELS 153,126 GET CmTCH STYLE 65536 FONT «Chicago",12 PICTURE -e*C Other SpC SIZE 15,84 
Q PIXELS 90,152 SAY "^ffeltches j " STYLE 65536 FONT -Geneva', 268 COLOR 0,0,-1,-1,-1,-1 
Q PIXELS 63,54 GET PRINTC^ STYLE 65536 FOMT •ChicagoM2 PICIURB "@*C Include clone listing" 
Q PIXELS 171,126 GET Imatch STYLE 65536 FONT "Chicago", 12 PICTURE ■@*C Inqyte" SIZE 15,65 CO 
Q PIXELS 252,146 GET initiate STYLE 0 FONT "Geneva", 12 SIZE'15,70 COLOR 0,0,-1,-1,-1,-1 
9 PIXELS 270,146 GET terminace STYLE 0 FONT •GenevaM2 SIZE 15,70 COLOR 0,0,-1,-1,-1,-1 
8 PIXELS 234,134 SAY "Include clones " STYLE 65536 FONT "Geneva", 12 COLOR 0,0,-1,-1,-1,-1 
e PIXELS 270,125-SAy "->" STYLE 65536 FOOT »GeneyaM4 COLOR 0,0,-1,-1,-1,-1 
Q PIXELS 198,126 GST PTP STYLE 55536 FONT "Chicago", 12 PICTURE "©^Q Print to file" SIZE 15,9 
e PIXELS 189,0 TO 257,120 STYLE 3871' COLOR' 0, 0, -1, -25600, -1,-1 

0 PIXELS 209,8 SAY "Library selection" STYLE 65536 FONT "Geneva'*, 266 COLOR 0,0,-1,-1,-1,-1 
e PIXELS 227,18 GET ENTIRE STYLE 65536* FOOT "Chicago" , 12 PICTURE "S*RV All; Selected' SIZE 16 

* EOF: Master analysis. 
READ 

IF ANAL°9 
CLEAR 

CLOSE DATABA SES 
ERASE TEMFt^ASTER*DBP 

USE ''SmartGuy!FoxBASE+/Mac;fox files i clones. dbf • 
SKT SAFETY ON 
SCREEN 1 OFF 
RETURN 
ENDIF 
Clear 

? INITIATE 
? TERMINATE 
•? -CQNDEN 

? moj 
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7 ematch 
? Hmatch 
? Canatch 
? imTCn 

SET TALK W 

I? ENTIREs2 
USB "Uhique libraries .'dbf- 

REPLACE ALL i WITO ' ' " 

BROWSE FIELDS i, lihname, library, total, entered AT 0,0 
ENDIP 

USE "Sinart:G\y:FoxBASE+/Mac;fox files t clones, dbf 

*CX>py TO TEMPNU:4 FOR NUMBER>BlWITIATE,WD.ICMSER<=TSrlMIKATE* 

*yS2 TEWPNUM 

COPY STRUCTURE TO TEKPLIB 
USE TEKPLIB 
IP ENTIREal 

APPEND FROM 'SinartGiy:Fo:iBASS+ /Macs fox files ; Clones . dbf- 
ENDIF 

IP Ehm:REte2 

USE "unique libraries.dbf ■ 

COPY TO SELE CTED FOR UPPSR(i) = 'Y* ' 
USE ^fet'i /FX^^^-'D 

STORE RSCCOUNTO TO STOPIT 
MARK=1 , 

DO WHILE .T» 

IP MARIOSTOPIT 

CLEAR . 

EXIT 

HNDIF 

USE SELECTED 
GO MARX 

STORE library TO THISGNE 
? 'COPYING ' 
?? THISONE 
USB TEMPLIB 

Xppend from ''SinartGuy;FoxBASE+/Macifox f ilea: Clones. dbf" FOR library-sTKISONE 

STORE MARX+l TO MARK 

LOOP 

m:ic 

ENDIP 

USE "SmarcGuy:PoxBASE4-/Mac:fox files : clones. dbf" 

COUNT TO STARTOT 

COPY STRUCTURE TO TEMPDE5IG 

USE fTEMPDSSIQ 

IP Ehiatch=0 .AND.. HmatchsO .AITO. Oiratch=0 .AND. IMATCH=0 

APPEND FROM TEMPLXB 

Et©IP 

IF Snatchsl 

APPEND FROM TEMPLIB FOR D='B' 
^XF 

IF Hmatchal 

APPEND PROM TEMPLIB FOR D='H' 
ENDIP 

IP Qmacchsl 

APPa^D PROM TEMPLIB FOR Ds»0' 
ENDIP 
- IP Imatchsl 

APPEND FROM TEMPLIB FOR D= • I ' .OR.Do 'X' .OR.D^'N' 
EI^DIF 

IP >&iiatchBl 

APPEND PROM TPIPLIB FOR D='X' 

E^ZF 
COUNT TO ANAI/TOT 
set talk off 

CO CASE 
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CASE PTP=0 

SET DEVTC3 TO PRINT 

SET PRINT CN 

CASE PTPsl 

SET ALTSRKATE TO "Total function eort.txt" 

•SET AC/TEI^NATE TO "H and 0 function sort.txC 

*SET ALTERNATE TO 'Shear Stress KUVEC 2; Abundance sort.txt" 

*SEr ALTETOIATE TO "Shear Stress HUVEC 2; Abundance con.ticf 

*SET ALTSRT^ATE TO "Shear Stress HUVEC 2: Function sort.txt* 

*SET ALTERNATE TO "Shear Stress HUVEC 2 : Distribution sort.txt" 

♦SET ALTERNATE TO "Shear stress HUVEC l;Clone list.txt" 

♦SET ALKRNATE TO "Shear Stresa HUVEC 2iliocation aort-txt** 

SET ALTERNATE ON 

ENDCASE 

*************** **^j[r** 
IP PRINTON^l 

©1,30 SAY "Database Subset Analysis" STYLE 65536 FONT "GaaBva", 274 COLOR 0, 0,0,-1, -1,-1 

BNDIP 

7 

? 

? 

? dateO 
?? * ' 
7? TIMBO 

7 ' Clone- numbers ' 

?? STR( INITIATE", 5,0) 

?? * bhrajgh * 

7? STR(TSHMINATS,6,0) 

? 'Libraries: ' 

IP ENTIRE=1 

? 'All libraries' 

ENDIP 

IP ENTIRE=s2 
MARiUl 
DO WHILE ,T. 
IF MARK>STOPIT 
EXIT 
ENDIF • 
USE SELECTED* 
GO MARK 
? ' » 

?? TRIMdibname) 
SroR3 MARK+1 TO MARK 
LOOP 
ENDDO 
ENDIF 

? 'Desiemationa: • 

IF BmatchsO .AND. Hmatch=0 -AND. cmatch=0 .AND. IMATCH«0 
?? 'All' 

IF Einateh=l 
77 'E&cact, ' 

ENDIP 

IF HEnatch=l 

77 'Human, ' 

ENDIP * 

IF Qroatch=l 

77 'Other .sp.< 

ENDIF 

IF Imatc*i=l 
7? 'INCVTE' 
ENDIF 

IF Xrratcb=l 
7? 'EST' 
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IF CnNDENol 

? » Condensed format analysis' 

EKDXF 

IF At^tsl 

? 'Sorted fcy NUMBER' 

ENDIF 

IF Pi^^hL^l 

? 'Sorted ty ENTRY' 

E3NDIP 

IF ANA£i=3 

? 'Arranged by ABUNDANTE' 

ENDXF 

IF ANALs4 

? 'Sorted fcy INTEREST' 

ENDIP 

IF ANAL=5 

? 'Arranged toy LOCATION' 
ENDIF * . 

IF m^^s 

? 'Arranged by DISTRIBUTION' 

ENDIF 

IF ANALa? 

? 'Arranged ty FUNCTION' 
ENDIF 

? 'Total clones represented: ' 

?? Sro(STARTOT,6,0) 

? 'Total clones analyzed.* .' 

?? STR(ANALTOT,6,0> 

? 

7 '1 = library d = designation f = distribution z = location r = function c = cer 
? 

TJ ff B I'EI'IFDES IQ 

SCREEN 1 T^E 0 HEADING "Screen 1' AT 40,2 SI2E 286,492 PIXELS PONT "Geneva-*, 7 COLOR 0,0,0, 
DO CASE 
CASE ANALsl 

* sort/number 
SET HEADING ON 
IF OMDENal 

SORT TO TEMPI ON ENTRY, NUMBER 
DO "CCMPRSSSION nuihber.PRG* 
EI^E 

SORT TO TEMPI CN NUMH3R 
USE TE3dPl 

list off fields nuiriberiL,D.P,Z,R,C,afrRY,S,DESCRIPTOR 

*aist off fialds IlUInber,L,D,P,S,R,C.E^^'Ry,S, DESCRIPTOR, LENGTH, RFEND,INIT, I 
CLOSE DATABASES 
ERASE TEMPI. D3F 
£NDIF 

CASE AKALs2 

* sorn/DESCRIPTOR 
SET HEADING ON 

♦SORT TO TE24P1 ON DESCRIPTOR, ENTRY, NUMBER/ S for Da'S' .OR.r^'K'\OR,D='0' <OR.D='X' .OR.Da'l' 
•SORT TO TEMPI ON ENTRY, DESCRIPTOR, NUMBER /S for Da'E' .OR.Da'H' .OR.Da'O* .OR.D='X' .OR.Da 'I' 
SORT TO TEMPI ON ENTRY, 5TART/S for D='E' .OR.D='K' .OR.D='0' .OR.D='.X' .OR.Da' I' 
IF CONEffiNal 

DO "OOWPRESSION entry. PRG" 
ELSE 

list off fields nun±>er,L,D,P,Z,R,C,Ern'RY,S, DESCRIPTOR, LENCnH.RFEND,OT 
CLOSS DATABASES 
ERASE TEMPI. DBF 
ENDIF 
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CASS WWj=3 

* sort abundance 
SET K3ADINQ 

SORT TO TEMPI ON ENTRY. NUMBER for D='E' .OR.D='H' .OR.D='0' .OR.Dx*X' tOR.D='I» 

DO "COEiPRESSiON abundance. FRG* 

CASE ANAL»4 

* sort/interest 
SET HEADING 

IF CONDENrl 

SORT TO TEMPI ON ENTOYr NUMBER FOR i>0 

DO "COMPRESSION interest . PPG" 

ELSE 

SORT m I/D, ENTRY TO TEMPI FOR I>1 
USE TEMPI 

list off fields niffltoer,L,D,F,Z.R,C|ErraiY,S,DESCRIPTOR,IiENGTK,RPEND,lNIT,I 
CLOSE DATABASES 
ERASE TEMPI. DBF 
ENDIF 

CASH ANALsS 

* arrange/location 
SET HEADItOS ON 
STORE 4 TO AMPLIFIER 
? 'Nucle«u:: ' 

SORT ON ENTRY/NUMBER FIELDS RFEND/ NUMBER. LrDiF, 2, R i C, ETOY, Si DESCRIPTOR, LHNOra, lOTT, 1,0 
IF CGNDENel 

DO "Conpression location.prg* 
ELSE 

DO 'Normal subroutine 1" 
EKDIF 

? 'Cytpplasmic; ' 

SORT CN EWTRY.NUMaER FIELDS RPEbTO, iJUlffiER,L,D,P,Z,R,C, ENTRY. S, DESCRIPTOR, i»3GTH, INIT, I, COM^^ 
IF CCNDENsl 

DO "Coi^presaion location.prg" 
ELSE 

DO "Nortnal stibroatine 1" 
EKDIF 

? 'Cycbskelecon: * 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L,D,r, 2, R,C.ENTOY,S, DESCRIPTOR, LENGTH, INIT, I, COMMEN 
IF 0QNDEN=1 

DO ^Compression location. org" 
ELSE 

DO •Normal subroutine 1" . 
ENDIF 

? *Cell surface: • 

SORT ON ENTRY, NUMBER FIELDS RFSND, NUMBER, L,D,F, 2, R,C, ENTRY, S, DESCRIPTOR, Lawm, INtT, I, CCMMEN 
IF OONDENsl 

DO "Compression location.prg" 
ELSE 

DO "Norrcal subroutine 1" 
ENDIF 

? 'Intracellular membrane: • 

SORT ON ENTRY.NUMBSR FIELDS RFSND, NUMBER, L.D,F, 2 iR,C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, COMMEN 
IF CQNDEN-1 

DO "Compression location.prg" 

DO "Normal subroutine 1" 
ENDIF 

? 'Mitochondrial:* 

SORT ON ENTRY,NUMBER FIELDS RFEND.NUIfflER, L,D,P, 2, R,C,'ENaKy,S, DESCRIPTOR, LENGTH, INIT, I. COMMEN 
IF CGNDENal 

DO "C uipx eaai n location.prg" 
ELSE. 

DO •Normal subroutine 1* 
ENDIP 
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7 *S0creb6di* 

SORT ON ECmiY.NUMBEH FIELDS RFEND,taMaER,LiD,F, 2. R,C,ENniV, S, DESCRIPTOR, LEN^^ 
IP 00NDSN=1 

DO "Ccaropfesaion location. prg* 

ELSE 

DO "Normal subroutine 1" 

ENDIP 

? 'Otheri' 

SORT ON EOTRY,NUMBER FIELDS RFEOT, NUMBER, L,D,F, 2, R, C. E^ITRyr DESCRIPTOR, LENGTH, 11^ 
IF CONDEN^l 

DO "Coirpresaion location. pro" 
ELSE 

DO "Normal subroutine 1* 
ENDIF 

? 'IJhJcnownj' 

SORT ON NUMBER FIELDS RFEOT, NUMBER, L,D,P, 2, R,Ci ENTRY, S, DESCRIPTOR, LENGTH, IN^ 

IP C0NDEN=1 

DO "Coopression location .prg' 

DO "Normal subroutine !■ 
ENDZF 

IF CONDEMsl 

SST DEV ICE. TO PRINTER 

SET PRINTER ON 

EJECT 

DO "Output heading. prg* 
USB "Analysis location-cSbf " 
DO "Create bargraph.prg" 
SETT -HEADINO OFF 

? • FUNCTIONAL CLASS TOTAL UNIQUE NEW % TOTAL' 

? 

LIST OFF FIELDS 2, NAME. CLONES, GENES, NEW, FERCEKT, GRAPH 
CLOSE DATABASES 
BRASS TEKP2.DBF 
SET HEADING ON 

*USE "ginartGuy5FoxBAS3*/Mac:£ox files jTEMEMASTER.dbf " 
EWOIP 

CASE ANAI^6 

* arrange/distribution 

SET HEADING ON 

STORE 3 TO AMPLIFIER 

? *Cell/ciBsue specific distribution:' 

SORT ON D^TRY^NUMBER FIELDS RFETO, NUMBER, L,D,F, 2 /R,C, ENTRY, S, DESCRIPTOR, LENGTH, INlT,l,COMIy^ 
IF OONDENsl 

DO "Canpression disnrib.prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

7 'Non-specific distribution: • 

SORT ON aOW^NUMBER FIELDS RFEND, NUMBER, L,D,F, 2, R, C, EtmiY, S, DESCRIPTOR, IjENGTS, IMIT, I, COhlMOT- 
IF CCSNDENal 

DO "Coopresgion distrib.prg" 

DO "Normal eubroutinfi 1" 
ENDIF 

7 'Unknown distribution; • . 

SORT CN EWrRY,NUMBER FIELDS RFEND, NUMBER, L,D,F,Z,R,C,E^Y,S, DESCRIPTOR, LINCTH » INIT, I, COMMEM 

IF CGNDENal 

DO "Ccfftipreasion diatrib.prg" 
ELSE 

DO "Noxxral subroutine 1" 
EIIOIF 

IF CQNDENsl 

SET DEVICE TO PRINTER 

SET PRINTER ON . 
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EJECT 

DO "Output heading. prg" 
US3 "Analysis distribution.dbf • 
DO 'CreatiB bargraph.prg' 
HEADING OFF 

? ' FUNCTIONAL CLASS TOTAL UNIQUE % TOTAL' 

? 

LIST OFT FIELDS P. NAME, CLONES, QEtffiS,PEilCENT, C5RAPH 
CLOSE DATABASES 
ERASE TE3^2.DBF 
SET HEADING ON 

*USE "SmartGuytPoxBASE+ZMactfox files :TEMPMASTSR.c[bf'' 
SNDIF 

CASE ANAL=7 

* arrange/ function 

SET HEADING ON 

STORE 10 TO AMPLIFIER 

? ' BIKDING PROTEINS' 

? 

? 'Surface niolecules and receptors;* 

SORT ON EKTRY,WUM3ER FIEIxDS RFEND, NUMBER, L, D,F, ZrR.C, ENT3iy,S, DESCRIPTOR, LENX3TH, INIT, I, COM^ 
IP CONDENsl 

DO *Car:pr6ssion f unction. prg" 
ELSE 

DO 'Noimal subroutine 1" ^ 
ENDIP 

? * Calcium- binding proteins: ' 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBER, L,D,F, Z,R,C,EmTlY, S, DESCRIPTOR, LENGTH, INIT, I, COMM^ 
IF CGNDEN=1 

DO •Ccmpression function .prg" 
ELSE 

DO 'Norinal subroutine 1" 
EWDIF 

? 'Ligands and effectors i* 

SORT ON EOTRY,NUMBER FIELDS R5END, NUMBER, L, D, F, Z,R,C,INrRY, S, DESCRIPTOR, I5NSTH,INIT, I, CO*^^ 
IF COMDEN-1 / r / / 1 

DO 'Conqpression f unction. prg" 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

7 'Other binding proteins: • 

SORT CN ENTRY, NUMBER FIELDS RFEND, NUMBER, L, D, P, Z. R, C, ENTRY, S, DESCRIPTOR, LE*IQra,INCT 
IP C0NDEN=1 

DO 'Compression f unction. prg" 
ELSE 

DO "Nonnal subroutine !*» 

ENDIP 

•EJECT 

? ' ONCOGENES' 
? . 
? 'General oncogenes i' 

SORT OEM ENTRY, NUMBER FIELDS RPEND, NUMBER, L,D,P,Z,R,C, ENTRY /S, DESCRIPTOR, LEKG^,!^^ 
IP OQNDENsl 

DO "Coinpression ^unction.prg" 
ELSE 

DO •Normal subroutine 1" 
ENDIP 

7 'OTP-binding proteins! * 

SORT ON ENTRY, NUMBER FIELDS RFEND, NUMBESl,L,D,P,Z,R,C, ENTRY, S, DESCRIPTOR, LENGTH, INIT, I, COM^ 
IF CaNDEN=;l 

DO ••Contpression fxmction.prg" 
ELSE 

DO 'Normal subroutine l' 
ENDIF 

? 'Viral elements I ' 
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SOOT ON ENXRY,NUMBER FIELDS R?END,m3KBER,L,D,F,Z,R,C,EOT^«f, S, DESCRIPTOR, larCTO 
IF CQf©EN=l 

DO "Coirpression function. prg" 
ELSB 

DO "Normal subroutine 1" 
ENDIF 

? 'Kitiases and Phosphatases!' 

SORT ON ENTRY,NUM3ER FIELDS BFE^ro,NU^IBER,L,D,F,Z,R,C,EOTRy, S,D^ISCRIPTORi LENGTH, INIT,^ 
IP CONDENal 

DO "Cbrapreasion ftmction.prg* 
ELSE 

DO "Normal subroutine 1" 
ENDIF 

? • Tumor- related antigens I ' 

SORT ON 3NPrRy,NUMBBR FIELDS RFSiro,NU>BBR, L,D,F,Z,R,C, ENTRY, S, DESCRIPTOR. LE^JGTO, m 
IF CONDEMsl 

DO "Coitqpression f unction. prg" 
ELSE 

DO "Nozroal subroutine V 

HNDir 

*aJECT 

7 * PROTEIN SYKTOETIC MACHINERY PROTEINS' 

7 

? 'Transcription and l^ucleic Acid-binding proteins:' 

SORT ON ENTRY, NUMBER FIELDS RFEDro,NUMB3R,L,D,F,Z,R,C, ENTRY, S, DESCRIPTOR, LENGTH, JOTT, I, C0h2^ 
IP OONDEMhI 

DO 'Coirpression runction.prg* 
ELSE 

DO •Normal subroutine 1" 

E23DIF 

? 'TranBlation! ' ' . . 

SORT CN ENTRY,NUMBER FIELDS RPENT),NUIffiER,L,D,F,Z,R,C,Et7mY,S, DESCRIPTOR, LENGTH, INIT, I, 
IF CQNDEWal 

DO "Compression function. prg" 
ELSE 

DO "Noxinal subroutine !• 
ENDIF 

? 'Ribosotnal proteins:' 

SORT ON ENTRY, NtJMBSR FIELDS PJ'END,NU»3BER, L,D, F, Z,R, C, ENTRY, S, DESCRIPTOR, IZtTOTH, INIT, I, CCM^ 
IP CQNDEMal 

DO "Conpressioa function. prg" 
ELSE 

DO "Normal subroutine !• 
ENDIF 

7 'Protein processing! ' 

SORT ON ENTRY, IJDMBER FIELDS RPEND, NUMBER, L,D,F, 2, R,C, ENTRY, S, DESCRIPTOR iISIGTO, DOT, I, COi^^ 
IF CQNDENsl 

DO •Conpression function. prg". 
ELSE 

DO "Nozmal subroutine 1* 

£MDIF_ 

* EJECT 

? ' ENZYMES' 
7 

? 'Perroproteinsi • 

SORT ON ENTRY, NUMBER FIELDS RFEt!ro,NUMBER,L, D,F, 2, R, C, ENTRY, S, DESCRIPTOR, LENGTH, EOTT, I, CC3MM^ 
IF CQNDEN=1 

DO ■Compression function. prg" 

DO "Noxznal subroutine 1* 
ENDIF 

7 ' Proteases and inhibit rs : • 

SORT ON ENTRY,NDMBER FIELDS RPEbro,NUMBER,L,D,P,Z,R,C, ENTRY, S, DESCRIPTOR, LENGTH, INM, I /CCMJEN 
IF CONDSNsl 

DO "Compression function.prg" 

65 
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IX) "Nornal subroutine 1" 
SKDIF 

? 'Oxidative phosphorylation: ' 

t^L^L^^^*^^^ FIELDS I^E^©,NlmER,L,D,F,Z,R,C,ENray,S,DESCHIPTOR.L^^ 
TP CCND2N=1 

DO "Compreaaion* function, pro" 
BL5S 

DO "Normal subroutine 1" 
SDIF 

7 'Sugar -jnetabolismi ' 

SORT ON ENTRY,NIM3SR FIELDS RPE^©,Nl)^!BER,L,D,F,2,R,C,©TOV^, S,DESCRIITOR,LE^ 
IF CQNDEKal 

DO "CoTTpression function »prg' 
ELSE 

DO "Normal subroutine !• 
ENDIF 

? 'Amino acid metabolism: » 

SORT ON ENTRV,NU1^3ER FIELDS RFE3>ro,NUW3ER,L.D,F,Z,RX,EOTRY,S,DESCRIPTOR.LEli^ 
IP CONnEN=l 

DO ■CortT)re3sion function.prg' 

ELSE 

DO 'Normal subroutine 1* 
ENDI? 

? 'Nucleic acid metabolismj ' * 

SORT ON EHTRy,NUMBER FIELDS RFEND,Nl»fflER,L,D/F,Z, R,C,EmY, S, DESCRIPTOR, 

IF coNnajsi 

DO 'Coirpres9ion f unction. prg" 

ELSE 

DO ^'Normal subroutine 1" 
EHDIF 

? 'Lipid metabolism: ' 

SORT ON ENTRY.NUMBER FIELDS RP2bm,NUMHER,L,D,F. 2, R,C,ErmiY,S, DESCRIPTOR, LEDK5TK,INIT, I, CO^^ 
IF CONDENal ' 
DO 'Corrpression f unction. prg" 

ELSE 

DO -Nonnal subroutine 1" 
ENDIP 

? •Other enzymes I' 

SORT ON ENTRY^NUMBEH FIELDS RFSb©,NUMBER, L,D,F, 2, R, C, E^^7lY, S, DESCRIPTOR, LS^XnH,^ 
IF CONDENsl 

DO "Coinpression function .prg" 
ELSE 

DO 'Normal subroutine 1" 

ENDIF 

♦EJECT 

? ' MISCELLANEOUS CATEGORIES' 

7 ' Stress ' response i ' 

SORT CN ENTRY^NUMBSR FIELDS RF51C).NUNBER, L, D, F, 2, R,C;Ermiy, S» DESCRIPTOR, LEKGTH, INCT 
IP CONDSNs:! 

DO 'Cortqpression function, prg" 
ELSE ■ 

DO 'Normal subroutine 1" 
ENDIF 

7 'Structural;' 

SORT ON ENTRy,NUMB£R FIELDS RFEND, NUMBER, L,0,F, 2. R,C|EmY,S. DESCRIPTOR, LENGTH, INIT, I, CC»^ 
IP COMDENal 

DO 'Conprefision function.prg" 
ELSE 

DO 'Nozmal subroutine 1" 
ENDIP 

? 'Other clones I • , . 

FIELDS RFEND,NUMBER,L,D,F,Z,R,C,ErmiY,S;DESCRIPTOR,LErraTH,INIT,I,CaMMEN 
IF C0NDEN=1 ' . ' ' 

DO •Compression functi n.prg" 
ELSE 
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DO ■Noznal subroutine 1" 
ENDIF 

? 'Clones of urtoiown function!* 

SORT CN ENTRY/NUMBER FIELDS RFEl©,N^J^!3Ea,L,D,P,2,R|C,StJTRy,S,DESCRlFTQR,LE^t^ 
IF CQNDENsl 

DO 'CoxnpresBion ziinction.prg" 
SLSE 

DO "110X11131 subroutine 1" 
ENDIP 

IF OONDENal 

EJECT 

*SBT DEVICE TO PRINTER 

*SET PRINT ON 

DO ^Output heading -prg" 

USE 'Analyaia function. ctof 
DO "Create bargraph.prg" 
SET HEADBsG OFF 

SCREEN 1 TYPE 0 HEADING "Screen 1* AT 40,2 SIZE 2^6,492 PIXELS PONT ''GenevaM2 COLOR 0,0,0 



? * FUNCTIONAL CLASS 



TOTAL TOTAL NSW DIST 
CLONES GENES GE*5ES FUI^IONAL CLASS' 



*L1ST 0^ FIELDS P,NAME, CLCNSS, GENES, NEW, PERCENT, GRAPH, COdPANY 
LIST OFF FIELDS P,NAME, CLOSES, GENES, NEW, PERCEI!fr,GRAPK 
CLOSE DATABASES 
ERASE TEMP2.DSF 
SET HEADING CN 

*USE ''atrartGuy:FaxBAaE+/Maaifox files ;TEMPMASTER.dbf' 
ENDIF 

CASE ANAIisS 

DO "Subgroup summary S-prg" 
ENDCASE 

DO "Test print. prg" 

SET PRINT OFF 

SET DEVICE TO SCREEN 

CLOSE DATABASES 

♦ERASE TEMPLIB.DBP 

*ERASE TEMPNUM^DBF 

*ERASE TEMP DESI G>DBF 

♦ERASE SELECTED.raF 

CLEAR 

LOOP 

SIDDO 
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* COMPRESSlOW SUBRCOTINE FOR ANALYSIS PROGRAI^S 
USE TiEMPl 
COIOT TO TOT 

REPLACE ALL RFEKD WITH 1 

MARKl = 1 

SW2sO 

DO WHILE Sl«=0 ROLL 
IF MARKl >B TOT 
PACK 

COUNT TO UNIQUE 

COUNT TO NEWQENE3 FOR D= 'H' .OR.D=»0' 

$W2=1 

LOOP 

ENDIF 
GO MARKl 
DUP s 1 

STORE ENTRY TO TESTA 
SW B 0 

DO WHILE SW=0 TEST 
SjOP 

STORE ENTRY TO TEST3 

IP TESTA = TESTS 

DELETE 

WP = DUPt-1 

LOOP • 

ENDIF 
GOMARKl. 

REPLACE RFEND WITH DQP 
MARKI « MARKl+DUP 
SW=1 
LOOP 

EHDDO TEST 
LOOP 

ENDDO ROLL 
•GO TOP 

STORE Z TO LOG * 

USE 'Analysis location. dbf 

LOCATE FOR 2=L0C 

REPLACE CLONES WITH TOT 

REPLACE GENES WITH UNIQUE 

REPLACE NEW WITH NEWGENSS 

USE TEMPI 

SORT ON RFEND/D TO TEMP2 

USE TEr4P2 

77 STR(UNIQUE,5,0] 

?? ' genes, for a total of ' 

?? STR(TOT,5,0} 

7? ' .cloneB' 

? ' V Coincidence' 

list off fields nuInber,Rm^),L,D,F,Z,R,C,2^W,S,DESCRIPT0R,LElIGTH,INIT,I 

*SET PRINT "OFF 
CLOSE DATABASES 
ERASE TSI^l.DBF 
ERASE TH^2»DBF 
USE TOMPDESIQ 
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* CCldPRESSION SUBROOTINS FOR ANALYSIS PROGRAMS 

USE TEMPI 

COUNT TO TOT 

REPLACE ALL RFBND WITH 1 

MARKl e 1 

SW2«0 

DO \miUB 5W2sO ROLL 
IP MARKl >= TOT 
PACK 

COTNT TO UNIQUE 

6W2=1 

LOOP 

ENDIF 
GO MARKl 
DUP = 1 

STORE ENTRY TO TESTA 
SW B 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTRY TO TESTS 

IF TESTA s TESTS 

DELETE 

DUP = DUP+1 

LOOP 
• BMDIF 
GO MARKl 

REPLACE RFEND WITH DUP 

MARKl = MARKl+DU? 

SW=1 

LOOP . 

ENDDO TEST 

LOOP 

EK9DD0 ROLL 
♦BROWSE 

•*SET PRINTER ON ' 

SORT ON DATE TO TEMP2 

USE TEMP2 

?? STR (UNIQUE, 4,0) 

?7 » genes, for a total of* 

?? STR(TOT,4,0) 

77 «• clonea' 

? 

? ' V Coijicidence' 

COUNT TO P4 FOR I«4 

IF P4>0 

7 STR(P4,3.0) 

?? ' genes with priority s 4 (Secondary analysis:) ' 

list off fields nllIrOM^,RPE^ro,L,D,^,2,R,C,E^OTY,S,DSSCRIPTOR|LSmTH,IN^T for 3=4 
? 

SNDI? 

COUNT TO P3 FOR Xo3 

IP P3>0 

7 STR(Pa,3,0) 

?? ' genes with priority a 3 (Full insert sequence:)^ 

list off fields nu2nber.RFElD,L.D,F,^,R,C,E^7^RY,S»DESCRlPTOR,LSl?CTK,INIT for Ia3 
? 

ENDIF 

COUNT TO ?2 FOR 1=2. 

IF P2>0 

? STR(P2,3.0) 

77 ' genes with priority » 2 (Primary analysis cortpletei)' 

list off fields nU2nber, RFEND, L,D,F,Z,R.C,EOTRY, 6, MSCRIPTOR, LENGTH, INIT for 1=2 
? 

ENDIF 

COUNT TO Pi FOR Isl 
IP P1>0 
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? STR(P1.3,0) 

?? ' genes with priority = 1 (Primary analysis neededi)* 

list off fields nuInber,RPE^^^,L;D,P,Z,R,C,IOT^^V,9,DSSCRIPT0R,IiENGTH,I^^T for Irl 



•SET PRIOT OFF 
CLOSS DATABASES 
ERASE TEMPI. DBF 
ERASE 1*EMP2.DBF 

USE "SmartGi^iFoxBASE+ZMaoifox f ilesj clones. dbf" 
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♦ CCaiPRESSICN SUBROUTINE FOR ANALYSIS PROGRAMS 
USE TEMPI 
COUNT TO TOT 

REPXACS ALL RFEID WITH 1 
MARKl = 1 

DO WHILE SW2=0 ROLL 
IP MARK! >s TOT 

COUNT TO UNIQUE 

SW2=1 

LOOP 

ENDIF 
GO MARKl 
DUP = 1 

STORE ENTRY TO TESTA 
SW s 0 

DO WHILE SV?=0 TEST 
SKIP 

STORE EOTRY TO TESTS 

IP TESTA = TESTS 

DSI£TE 

DUP s DUP-H 

LOOP 

ENDIP 
GO MAPJCl 

REPLACE RPEND WITH DUP 
MARKl e MARKl-fDUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

Q3DD0 ROLL 
*BROWSE 

*SST PRINTER ON • 
SORT ON NUMBER TO TEMP2 
USE TEMP2 

?? STR (UNIQUE, 4,0) 

7? ' genes, for a total of » 

77 STR(TOT,5,0) 

77 • clcmes' 

^ ' V Coincidence' 

list off fields nuiltoer,RESND,L,D,F,Z,R,C,ENTOY,S,DESCRIPTOR,LEKGTH,Iim',^ 

♦SET PRINT OFF 
CLOSE DATABASES 
ERASE TEMPI. DBF 
ERASE TEMP2.DBF 

USE •fimartGijy(FoxBASB+/M&c:fox files : clones . dbf • 
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* COMPRESSION SUBROUTINfi FOR Al^XLYSlS PROGRAMS 
USE TEWPl 
OOUOT TO TOT 
REPLACE ALL RFEND WITH 1 
. MARKl = 1 
SW2=0 

DO WHXIiB 5W2aO ROLL 
IP MARKl >B TOT 
PACK 

CCONT TO UNIQUS 

COUOT TO NEWSENfBS FOR D=*H' ,OR.D='0* 

LOOP 

GO MARKl 
DUP - 1 

STORE ENTRY TO TESTA 
SW o b 

DO W HILE SW=0 TEST 
SKIP 

STORE ENTRY TO TEST3 
IF TESTA = TSSTB 
DELETE 
DUP = DUP+1 
LOOP 
SNDIP 

GO MARKl* 

REPIACE RFEND WITK DUP 
MARK! o MARKl+DOP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
GO TOP 

STORE R TO FUNC 
USE "Analysis function, dbf" 
LOCATE FOR P=FUNC 
•REFLACS CLONES WITH TOT 
REPLACE GENES WITH UNIQUE 
REPLACE hSESfi WITH NEWGE2^5. 
USE TEMPI . 
SORT CN RFEND/D TO TEMP2 
USE TEMP2 
SET HEADING 
?? STR(UNI<OT;,5,0) 
?? ' genes, for a total of ' 
77 STR(TOT,5,0) 
7? ' clones* 
**« 

? • * V Coincidence' 

list off fields number, RFEIro,L,D,F,Z,R,C,E^mlY,S, DESCRIPTOR, LENGTH, INIT, I 

♦SCREEN 1 TYPE 0 HEADING "Screen !• AT 40,2 SIZE 286,492 PXXELS FONT "GenevaM3 QOUiK 0,0, 
*li6t off fielda RFEND, S, DESCRIPTOR 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEKPl.DHP 
ERASE TQ^2.DBF 
USE TEMFDESIG 
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* CCMFR5SSI0N SUBROUTINE FOR ANALYSIS PROGRAMS 
USB TSUPl 
OOUWT TO TOT 

REPIACE MJL RFEND WIOH 1 
MARKl n 1 
SN2— 0 

DO WHILE SW2«0 ROLL 
IF MARKl >a TOT 
PACK 

COUT^ TO UNIC3US 

SW2=1 

LOOP 

ENDIP 

GO m;shki 

DUP = 1 

STORE ENTRV TO TESTA 
SW B 0 

DO WHILE SWa:0 TEST 
SKIP 

STORE ENTRY TO TBSTB 

IF TEffIA a TESTS 

GELETTEI 

CUP = DOP+1 

LOOP 

ENDIP 
GO MARKl 

REPL?^ RFESro WITH DUP 
MftRKl = MARKl+DUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
GO TOP 

STORE P TO DIST 

USE "Analysis distribution, dbf" 
LOCATE FOR PsDIST 
REPLACE CLONES WITH TOT 
REPLACE GENES WITH UNIQUE 
USE TEMPI 

«ort on rfend/d to TEMP2 

USE TEMP2 

?? am (UNIQUE, 5,0) 

7? • genes, for a total of * 

?7 STR(T0T,5,0) 

7? ' Clones* 

7 • V Coinciaenea' 

list off fields nurriDer,RPE&ro,L,D,P, Z, R,C,EJ7niY,S, DESCRIPTOR, LEJ^GTO, ^ 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TE^l.DBF 
.ERASE TEMP2.DBF 
USE T£MPZ3ESia 
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♦ COMPRESSION SUBROUTINE FOR ANALYSIS PROGRAMS 

USB TEMPI 

COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARKl s 1 

SW2bO 

DO VSilLE SW2=0 ROLL 
IF MARKl >« TOT 
PACK 

COUNT TO UNIQUE 

LOOP 

EMDI? 
GO MARKl 
DUP » 1 

STORE ENTRY TO TESTA 
SW « 0 

DO WHILE SW=0 TEST 
SKIP 

STOPi ENIRY TO TESTE 

IF TES TA B TESTE 

DELETE 

DUP.= DUP+1 

LOOP 

£NDIF 
GO MARKl 

REPLACE -RFEND WITH DCJP 
MARKl K MAHKl+DUP 
SW=1 
LOOP 

S^DO TEST 
LOOP 

ENDDO ROLL ' 

GO TO? 

USE TEMPI 

?? STR (UNIQUE. 5,0) 

77 • genes, for & total of • 

?? STR(TOT,5,0) 

7? ' clones' 

J. ' V Coincidence* 

list off fields nuinb6r,RFraro,L,D,F,Z,R,C,ENTRY,S,DESCRIPT0R,LEN^ 

*SET PRIMP QPP 
CLOSE DATABASES 
ERASE TEMPI. DBF 
USE TEMPDBSIG 
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• COMPRESSION SUBROUTINS FOR ANALYSIS PROGRAMS 
USE •'Srt»rtGuy:FoxaASE4-/Wac:fox filesiClones.dbf ■ 
COPTf TO TSMPl FOR 
USE 

COUNT TO IDQENE FOR D='E' .OR.Da'O' .OR.D='H' .OR,D= 'N' .OR.D= 'R' .OR.D^'A' 

DEI^ FOR D='l«'-OR.D='D'.0R.r)=»A'.0R.D='U',0R.D=r*S*.0R»D='M'.0R.D«*R'.OR.D=»V' 
PACK 

COUNT TO TOT 

REPLACE ALL RFEND WITH 1 
MARKX = 1 . 

DO WHILE SW2sO ROLL 
IF >= TOT 

PACK 

COUNT TO miQim 

SW2=1 

LOOP 

ENDIF 
GO MARKl 
EUP B 1 

STORE laOTRy TO TESTA 
SW s 0 

DO WHILE SW^O TEST 
SKIP 

STORE ENTRY TO TESTS 

IF 'f^ESTA a TESTE 

DELETE' 

DUP a DUP+1 

LOOP 

ENDIF 
GO MARKl 

REPLACE RFEND WITH DUP 
MARK! = MARKl+DUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
♦BROWSE 

*SET PRINTER ON 

SORT ON RFESD/D, NUMBER TO TE^G>2 
USE TEMP2 

REPLACE ALL START WITH RFEND/ IDGENE*10000 

?? STR (UNIQUE, 5,0) 

7? ' genes, for » total of ' 

77 STR(TOT,5,0) 

?? • clones' 

? ' Coincidence V V Clones/10000* 

set heading off 

SCREEN 1 TYPE 0 HEADING 'Screen 1^ AT 40,2 SIZE 286,492 PIXELS FCm -Geneva", 7 COLOR 0,0,0, 

list fields auinber,RFEND,START,L,D,F,Z,R,C,0frRY,S, DESCRIPTOR, INIT,I 

♦SET PRINT OFF 

CLOSE DATABASES 

ERASE TEMPI. DBF 

ERASE TEMP:2,nBF 

USE •SinartGuy;FoxBASEt/Mac:fox files: clones. dbf" 
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* ODMPHESSIOM SUBROUTINE FOR ANALYSIS PROGRAMS 

USE Tm?i 

COUNT TO IDGENB FOR D='S* .OR.Dr'O' ,OR.D='H* .OR.Ds'N* .OR.D='R' .OR.Da'A' 

DELETE FOR Dn'N* .OR.D='D' . OR.D;:* A' .OR,D='U* .OR.D='S' .OR.Dc'M* .OR.D=*R' .OR.Da'V 

PACK 

COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARKl = 1' . 

SW2bO 

DO WHILE SW2sO ROLL . . 

IP MARKl TOT 
PACK 

COUNT TO UNIQUE 

SW2=:1 

LOOP 

ENDIF 
GO MARKl 
DUP a 1 

STORE E^Y TO TESTA 
SW » 0 

DO WHILE SW=0 TEST 
SKIP 

STORE ENTPRY TO TESTE 

IP TESTA = TEST3 

DELETO 

DUP * DUP+l 

LOOP - 

ENDIF 
GO MARK! 

REPLACE RFEND WITH DUP 
MARKl a MARXX+DUP 
SW=1 ^ ^ 
LOOP 

ENDDO TSST 
LOOP 

ENDDO ROLL 
♦BROWSE 

*SET PRIOTHR ON 

SORT an RFEND/D, NUMBER TO TEaiP2 
USB TEMP2 

REPLACE ALL START WITH RFEND/IDGENE*10000 

7? STR(UNIQU3,5,0) 

7? ' genes, for a tdtal of ' 

7? STR(TOT,5,0) 

77 • Clones' 

7 • Coincidence V V Clones/aOOOO' 

set heading off 

SCREEN 1 TYPE 0 HEAbiNG "Screen V AT 40,2 SIZE 286,492 PIXELS FONT "Geneva",? COLOR 0,0/0, 

list fields nxunber, RFEND, START, L,D,?, 2 ;r,c, ENTRY, S,DESCRlPTOR;nTrT, I 

*SET PRIKT OFF 

CLOSE DATABASES 

ERASE TEMPI. DBF 

ERASE TEMP2.DBP 

USB "SinartGvyjFoxBASE-»-/Macifox files i clones. dbf 
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USE TEMPI 
COUNT TO TOT 

?? ' Total of 

?? STR(TOT,4,0) 
77 • clones* 
7 

*liflt Off fields nuirtoer,L,D,F,Z,R,CrEtW!Ry, DESCRIPTOR, I^GTH,RFEND, 
Use Off fields nusiber,L,D,F,Z,R,C,ENTRY,OESCRIFTDR 
OOSE DATABASES 
ER;^E • 7^M?1 . 0BF 
USE 7SKPDESZG 
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♦Lifescan menu; version 8-7-04 

SET TALK OFF 

set device to screen 

OiEAR 

USB *'SnertGuy:PoxBASE+/Mac;fox f 11 est clone s.dbf 
STORE LUFDATEO TO Update 
GO BOTTCM 

STORE REGNO 0 TO cloneno 
STORE 6 TO Chooser 
DO WHU^H -T. 

* Program*: Lifdseq inenu.fmt 

* Date,,.. I 1/11/95 

.* version.: PoxEASE+/Mac, revision 1.10 • 

* Notes. . . . : Format file Lifeseq menu ^ 
* 

SCREEN 1 TYPE 0 HEADINS "Screen 1" AT 40,2 SIZE 286,492 PIXELS FOOT "Geneva", 268 COLOR 0,6, 
Q PIXELS 18,126 TO 77,365 STifLE 28479 COLOR 32767,-25600,-1,-16223,-16721,-15725 
@ PIXELS 110,29 TO 188,217 STYLE 3871 COLOR 0,0,-1,-25600,-1,-1 

e PIXELS 45,161 SAY "LIFESEQ" STYLE 65536 FONT 'Geneva', 536 COLOR 0,0,-1,-1,7135,5884 

0 PIXELS 36,269 SAY •IW" STYLE 65536 FOOT •GencvaM2 COLOR 0,0,-1,-1,7135,5884 

G PIXELS 63,143 SAY "Molecular eiology Desktop* STYLE 65536 FONT "Helvetica" , 18 COLOR 0,0,0, 

Q PIXELS 90,252 TO 251,467 STYLE 28447 COLOR 0,0,-1,-25600,-1,-1 

0 PIXELS 117,270 GET Chooser STYLE 65536 FONT "Chicago", 12 PICTURS "9*RV Transcript profiles 
d PIXELS 135,128 SAY Ut^date STYLE 0 F0^7^ "Geneva", 12 SIZE 15,79 COLOR 0,0,0,-25600,-1,-1 
e P3X3LS 171,128 SAY cloneno STYLE 0 FOOT "GenevaMa SIZE 15,79 COLOR 0,0,0,-25600,-1',-! 
© PIXELS 135,44 SAY "Last upciatej" STYLE 65536 FOOT •'GenevaM2 COLOR 0,0,-1,-1,-1,-1 
0 PIXELS 171,44 SAY "Total clones:" STYLE 65536 FOOT "GenevaM2 COLOR 0,0,-1,-1,-1,-1 
0 PIXELS 45,296 SAY "vl.SO- STYLE 65536 FCOT 'Geneva", 782 COLOR 0,0,-1,-1,-1,-1 

* EOF: Lifesecx menu.fmt: 

READ 
DO CASE 

CASE Chooserd 

DO "6martGuyiFoxEAS£-i-/Mac:£ox £il$s:Output programs {Master ar^lysis 3.prg" 
CASE Chocs erc2 

DO "SmartGwiFo«3ASE+/Mac:fox files: Output programs : Subtraction 2.prg" 
'CASE Chooser=3 

DO "SmartGv^/:FoxHASE+/Mac:fo>c fil€s;Output programs : Northern (single) .prg" 

CASE Choosera4 

USE "Libraries.dbf • 

BROWSE 

CAiSB Choosers 5 

DO "SmartGiJ^sFoxBASE-t-ZMacrfox tiles: Output programs i See individual clone. prg" 
CASE choo6ers6 

DO *SmartGuy:FaxBASE-f/Mac:£ox files: Libraries: Output programs: Menu. prg" 

CASE Chooser=7 

CLEAR 

SCREEN 1 OFF 

RETURN 

ENDCASB 

LOOP 
ENDDO 
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01,30 SAY -Database Subset Analyais' STYLE 65536 FONT ■'Geneva\274 COLOR 0,0,0,-1,-1,-1 

? ' 

7 

? 

? 

? dateO 
?? • 

77 TIMBO 

? "Clone numbers * 

?? STR(INmATE,6,0) 

?? « through * 

?? STR (TERMINATE #6,0) 

7 'Libraries: ' 

IP ENTIRE=1 

? 'All libraries' 

EMDIF 

IF ENTIRE=2 
KAHKal 
DO WHILE .T. 
IP MARK>STO?IT 
EXIT 

USS SELECTED 
GO MARK 
. 7 • ' 

7? TRIM(libname) 
STORE MAHK+l TO MARK 
LOOP 
ENDDO 
EKDIP 

? 'Designations i ' 

IF £match=0 .AMD. Kmatch=a .AND. Otnatch=0 

?? 'All' 

S33DIF 

IP Bcaatch«»l 
7? 'Exact,' 
ENDIF 

IF Hniatch=l 
?? 'Human,' 
ENDIF 

IP Qooatchsl 
?? 'Other Bp. ' 
EMDIP 

IF CONDEa^al 

? 'Condensed format analysis' 

ENDIP 

IF AMAL=1 

?• 'Sorted by number* 

ENDIF 

IF ANALs2 

? 'sorted by ENTRY' 

ENDIP 

IF ANALo3 

? 'Arranged by ABUNDANCE' 

ENDIP 

IP ANAL=4 

? 'Sorted by INTEREST' 

ENDIP 

IP ANAL=S 

? 'Arranged by LOCATION' 

ENDIF 

IF ANAL-6 

? 'Arranged by DISTRIBUTION' 

ENDIP 

IP ANALa7 

? 'Arranged by FUNCTION' 
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ENDIP 

? '•Total clones repreaenced: 

77 STR(SXARTOT,6,0) 

? •Total clones analyzed! ' 

77 STR(Af3ALTOT,6,0) 

7 

7 
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USE TEMPI 
CXJtJNT TO TCW* 
?? ' Total of 
?? STR(TOT,4,0) 
?7 ' clones* 
7 

•list 6if fields nuntoer,t,D,F, Z,R,C,EimY, DESCRIPTOR, l®3Gra,RFS^ 
list off fields number, L,D.F,Z,R,C,E!!ITRY, DESCRIPTOR 
CUOSB aVTABASBS 
ERASE TEMPI. DBF 
USE TEMPDESIG 
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USE TSMPl 
CXXWT TO TOT 
?? » Total Of 
?? STR(TOT,4,0) 
?? ' clones! 
? • 

*list off fields number, L,D,P,Z,R,C,E2rrRY,DBSCiaPT0R|I£NGTO/RFE^ 
list off fields XlUItiber,L,D,P,Z,R,C,nrrRV. DESCRIPTOR 
CLOSE OATAa^SS 
ERASE TEMPI, DB? 
USB TE^S^SIG 
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♦Northern (single), version 11-25-94 
close databases 
SET TALK OFF 
SET PRINT OF? 
SET EXACT OFF 

STORE ' ' TO Eobjeot 

STORE ' 'TO Oobject 

STORE 0 TO Ntjnib 
STORE 0 TO Zog 
STORE 1 TO Bail 
EO Wfai-E .T. 

* Program.: Northern (single) .fmt 

* Date : B/ 8/94 

* Version,: FoxBASS-t-ZMac, revision 1.10 

* Notes...,; Format file Northern (single) 
« 

SCREEN 1 TifPE 0 HEADING "Screen 1' AT 40,2 SIZE 286,'492 PIXELS FOOT "GenevaMa COLOR 0,0,0 
0 PIXELS 15,81 TO 46,3&7 STVLE 28447 COLOR 0,0,-1,-25600,-1,-1 
0 PIXELS 89,79 TO 192,422 STYLE 28447 COLOR 0,0,0,-25600,-1,-1 
@ PIXELS 115,98 SAY "Entry STYLE 65536 FOOT "Geneva", 12 COLOR 0,0,0,-1,-1,-1 
Q PIXELS 115,173 GET Eobject STYLE 0 FOMT "GenevaM2 SIZE 15,142 COLOR 0,0,0,-1,-1,-1 
Q PIXELS 145,89 SAY "Description" STYLE 65536 FOOT "GenevaM2 COLOR 0,0,0,-1,-1,-1 
0 PIXELS 145,173 GET Dobject STYLE 0 PONT "QenevaM2 SIZE 15,241 COLOR 0,0,0,-1,-1,-1 
0 PIXELS 35,89 SAY "Single Northern search screen" STYLE 65536 FONT 'Geneva", 274 COLOR 0,0,- 
0 PIXELS 220,162 GET Bail STYLE 65536 FOOT "Chicago", 12 PICTOR2 "3*R Com:inue;Bail out' SIZE 
0 PIXELS 175,98 SAY "Clone ^f;" STYLE 65536 FONT "Geneva";12 COLOR 0,0,0,-1,-1,-1 
0 PIXELS 175,173 GET Nuirib STYLE 0 FONT ''GenevaM2 SIZE 15,70 COLOR 0,0,0,-1,-1,-1 ' 
•0 PIXELS 80,152 SAY "Enter any ONE of the following:- STYLE 65536 FONT "Geneva", 12 COLOR -1. 

* EOF: Northern (single). fmt 
READ 

IP Bail«2 
CLEAR 

screen 1 off 

RETURN 

EMXEP 

USE ^SrrartGiy:FoxBAS&»' /Macs Fox files iLooXuptdbf" 
SET TALK'CN • 

IF Bobjecto' . * 

STORE UPPE R (Eobject) to Sobject 

SETT SAFETY OFF 

SORT ON Entry TO "Lookup entry, dbf" 

SET SAFETY ON 

USE "Loolcup entry.dbf" 

LOCATE FOR Look^sEobject 

IF .NOT.FOUNDO 

CLEAR 

LOOP 

ENDIF 

BROWSE 

STORE Entry TO Searchv^Ll 

CLOSE DATABASES 

ERASE "LooJa^'eaitry.dbf " 

ENDIP 

IF Ddbjecto' ' 
SET EXACT OFF 
SET SAFETY OFF 

SORT ON descriptor TO ■Loo)cup descriptor. db£* 

SET SAFETTY On 

USE "iiOo)cup descriptor. dbf 

LOCATE FOR UPPER (TRIM (descriptor )) =:UPPER( TRIM (Dob ject) ) 

IF .NOT.FCUNDO 

CLEAR 
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LOOP 

ENDIP 

BROWSE 

STORE Entry TO Searchval 

CDOSE DATABASES 

ERASE "Lookup descriptor. db£" 

SET EXACT ON 

EUDIP 

IP NixrriboO 

USE ■finiartGuy:PoxBASE+/MactFc»t f iles: clones. dbf* 

GO Nuziib 

BROWSE 

STORE Entry TO Searchval* 
ENDIP 

CLEAR 

7 *Northeni analysis for entry ' 
?? Searchval 

7 'inter Y to proceed' 

WAIT TO OX 

CLEAR 

IP UPPER (OK) o'Y' 
screen 1 off 
RETURN 
ENDIF 

* COMPRESSION SUBROUTINE FOR Library, dbf 
7 'Coinpressing the Libraries file now,,.' 

USE ■SinartGuy:FoxBASE+/MaciFox f iles: libraries. (3bf' 
SET SAFETY OFF 

SORT ON library TO "Conipreaeed libraries. dbf" 

* FOR enter ed>0 
SET SAFETY ON 

USE 'CozTipressed libraries.dbf " 

DELETE FOR entered^O 

PAOC 

COUNT TO TOT* 
Mmi « 1 

DO WHILE SW2=0 ROLL 
IF MARKl >a TOT 
PACK 

LOOP 
ENDI? 
GO MARKl 

STORE library TO TESTA 
SKIP 

STORE Library TO TESTE 
IP TESTA = TESTS 
DELETE 
EMDIF 

MARKl - l^lARKl+1 
LOOP 

mDXX> ROLL 

* Northern analysis 
CLEAR 

? 'Doing the northern now- . . ' 
SET TALK ON 

USB "SmartGuy:FoxaASB*/Mac:P x files icl nea.dbf 
SET SAFETY OFF 

COPY TO "Hits, dbf • FOR entrys searchval 
SET SAFETY CW 
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CLOSE VhThBJ^EB 
SELECT 1 

USE "Conpressed libraries. dbf 
STORE HBCCOUNTO TO Entries 
SELECT 2 
USE "Hits.dbf" 

DO WHILE .T. 

SELECT 1 

IF Mar3oEntries 

EXIT 
GO MARK 

STORE liiaraiy TO Jigger 
SELECT 2 

COUNT TO Zog FOR library:=Jigger 
SELECT X 

REPLACE hits with Zog 

MarksHark-hl 

LOOP 

HNDDO' 

SELECT 1 

BROWSE FIELDS LlfiRARV,LIBNAME, ENTERED, HITS AT 0,0 
CLEAR 

? »Enter Y to print: ' 

WAIT TO PRINSBT 

IF UP5ER (PRINSET) = ' Y ' 

SET PRINT ON 

CLEAR 

E3ECT' - 
SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40,2 SIZE 286,492 PIXELS PONT "Geneva M4 COLOR 0,0,0 
? 'DATABASE ENTRIES MATCHING EOTRY ' 
?? Searchvol 
? DATED 
? ■ 

SCREEN 1 TYPE 0 HEADING "Screen 1" AT 40/2 SIZE 266,492 PIXELS FONT "GenevB%7 COLOR 0,0,0, 

LIST OFF FIELDS library, libname, entered, hits 

? 
7 

SELECT 2 ; 

LIST OFF FIELDS NUMBER,LlBRARY,0,S,?,Z,R,ENi:iY,DESCRIPTOR,R?START,START,RFEND 
SET TALK OFF 
SET PRINT OFF 
EKDIF 

CLOSE DATABASES 
SET TALK OFF 
CLEAR 

DO 'Test print .prg' 
RETURN 
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TABLE 6 



library 

ADENtNBOl 

ADRENOR01 

AOrlENOTQI 

AMLBNOT01 

eMARNOTOI 

BMARNOT02 

CAR0NOTD1 

CHAONOTO1 

CORNNOTDl 

PSRA0701 

F1BRAGT02 

REnAhfTOI 

FIBnMGTOI 

FIHFWOT01 

nSRNOfTOa 

HMC1NOT01 

HUVaPBOl 

HUVEN0801 

HUVESTB01 

HYPONOBOl 

KI0NNOT01 

UVRNOTO1 

LUNGNOTOI 

MUSCNOT01 

OVI0NOB01 

PANCNOT01 

PmJNOROl 

prruNOToi 

PLACNOB01 

SPLNFETOI 

SPLWNOT02 

ST0MNOT01 

SYNORAB01 

TBLVNOTDI 

TCSTNOTOl 

THP1NOB01 

THP1PEB01 

THPIPLBOt 

U937NOT01 



libname 
Inflamed adef\a}d 
Adrenal gland (0 
Adrenal gland (T) 
AMI blast eeUs (V) 
Bone marrow 
Bona marrow (T) 
Cardiac muscle CQ 
Chtn. hamster ovary 
Corneal stroma 
FibroWasi, AT 5 
Fibfobla3t» AT 30 
Fibroblast, AT 
Fibroblast, uv S 
Fibroblast, uv 30 
Rbroblasl 
Fibroblast, normal 
Mafii cell Una HMC-1 
HUVEC1FN,TNF.IJ»S 
HUVEC comrol 
HUVEC shear stress 
Hypothalamus 
Kidney (T) 
Uver(T) 
Lung (T) 

Skeletal mufide (T) 
Oviduct 

Pancreas, normal 
Pituitary (r) 
Pllullary (Tl 
Placenta 

Small intestine (T) 
Spteanfliver, fatol 
Spleen CT) 
Stomach 

Rheum, synovium 
T 4- B lymphoblafit 
Testis (T) 
THP-1 control 
THP phorbol 
THP-1 phorbol l^S 
U937, monocytic leuk 



number library 

2304 U937NOT0t 

3240 HMC1NOT01 

3269 HMC1NOT01 

«93 HMC1NOT01 

8989 HMC1NOT01 

9139 HMC1NOT01 



d a f 2 r entry 
E H C C T HUMEF1B 
E H C C T HUMEF1B 
E H C C T HUMEFiB 
E H C C T HUMEFIB 
E H C C T HUMEFIB 
E H C C T HUMEFIB 



deacrlptor 
Elongation lador 1-beta 
Elongation (ador 1-beta 
Elongation factor 1-beta 
Elongation factor i-beta 
Elongation (actor i-beia 
Elongation factor i*beta 



rletanatari rfend 

0- 0 773 

0 370 773 

0 371 773 

0 470 773 

0 327 773 

0 37S 773 
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WHAT IS CLAIMED IS! 

1. A method of analyzing a specimen containing gene 
transcripts, said method comprising the steps of: 

(a) producing a library of biological sequences; 
5 (b) generating a set of transcript sequences, where 

each of the transcript sequences in said set is indicative 
of a different one of the biological sequences of the 
library; 

(c) processing the transcript sequences in a 

10 programmed computer in which a database of reference 

transcript sequences indicative of reference biological 
sequences is stored, to generate an identified sequence 
value for each of the transcript sequences, where each said 
identified sequence value is indicative of a sequence 

15 annotation and a degree of match between one of the 

transcript sequences and at least one of the reference 
transcript sequences; and 

(d) processing each said identified sequence value to 
generate final data values indicative of a number of times 

20 each identified sequence value is present in the library. 

2. The method of claim 1, wherein step (a) includes 
the steps of: 

obtaining a mixture of mRNA; 
making cDNA copies of the mRNA; 
25 isolating a representative population of clones 

transfected with the cDNA and producing therefrom the 
library of biological sequences, 

3. The method of claim 1, wherein the biological 
sequences are cDNA sequences. 

30 4. The method of claim 1, wherein the biological 

sequences are RNA sequences. 

5. The method of claim 1, wherein the biological 
sequences are protein sequences. 
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6. The method of claim 1, wherein a first value of 
said degree of match is indicative of an exact match, and a 
second value of said degree of match is indicative of a 
non-exact match. 

5 7. A method of comparing two specimens containing 

gene transcripts, said method comprising: 

(a) analyzing a first specimen according to the 
method of claim 1; 

(b) producing a second library of biological 
10 sequences; 

(c) generating a second set of transcript sequences, 
where each of the transcript sequences in said second set 
is indicative of a different one of the biological 
sequences of the second library; 

15 • (d) processing the second set of transcript sequences 

in said programmed computer to generate a second set of 
identified sequence values known as further identified 
sequence values, where each of the further identified 
sequence values is indicative of a sequence annotation and 

2 0 a degree of match between one of the biological sequences 
of the second library and at least one of the reference 
sequences; 

(e) processing each said further identified sequence 
value to generate further final data values indicative of a 

25 number of times each further identified sequence value is 
present in the second library; and 

(f) processing the final data values from the first 
specimen and the further identified sequence values from 
the second specimen to generate ratios of transcript 

30 sequences, each of said ratio values indicative of 

differences in numbers of gene transcripts between the two 
specimens . 



8. A method of quantifying relative abundance of mRNA 
in a biological specimen, said method comprising the steps 
35 of: 

(a) isolating a population of mRNA transcripts from 
the biological specimen; 
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(b) identifying genes from which the itiRNA was 
transcribed by a sequence-specific method; 

(c) determining numbers of mRNA transcripts 
corresponding to each of the genes; and 

5 (d) using the mRNA transcript numbers to determine 

the relative abundance of mRNA transcripts within the 
population of mRNA transcripts. 

9. A diagnostic method which comprises producing a 
gene transcript image, said method comprising the steps of: 

10 (a) isolating a population of mRNA transcripts from a 

biological specimen; 

(b) identifying genes from which the mRNA was 
transcribed by a sequence-specific method; 

(c) determining numbers of mRNA transcripts 
15 corresponding to each of the genes; and 

(d) using the mRNA transcript numbers to determine 
the relative abundance of mRNA transcripts within the 
population of mRNA transcripts, where data determining the 
relative abundance values of mRNA transcripts is the gene 

20 transcript image of the biological specimen. 

10. The method of claim 9, further comprising: 

(e) providing a set of standard normal and diseased 
gene transcript images; and 

(f) comparing the gene transcript image of the 

25 biological specimen with the gene transcript images of step 
(e) to identify at least one of the standard gene 
transcript images which most closely approximate the gene 
transcript image of the biological specimen. 

11. The method of claim 9, wherein the biological 
30 specimen is biopsy tissue, sputum, blood or urine. 

12. A method of producing a gene transcript image, 
said method comprising the steps of 

(a) obtaining a mixture of mRNA; 

(b) making cDNA copies of the mRNA; 
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(c) inserting the cDNA into a suitable vector and 
using said vector to transfect suitable host strain cells 
which are plated out and permitted to grow into clones, 
each clone representing a unique mRNA; 
5 (d) isolating a representative population of 

recombinant clones; 

(e) identifying amplified cDNAs from each clone in 
the population by a sequence-specific method which 
identifies gene from which the unique mRNA was transcribed; 
10 (f) determining a number of times each gene is 

represented within the population of clones as an 
indication of relative abundance; and 

(g) listing the genes and their relative abundance in 
order of abundance, thereby producing the gene transcript 
15 image. 

13. The method of claim 12, also including the step 
of diagnosing disease by: 

repeating steps (a) through (g) on biological 
specimens from random sample of normal and diseased humans, 
encompassing a variety of diseases, to produce reference 
sets of normal and diseased gene transcript images; 

obtaining a test specimen from a human, and producing 
a test gene transcript image by performing steps (a) 
through (g) on said test specimen; 

comparing the test gene transcript image with the 
reference sets of gene transcript images; and 

identifying at least one of the reference gene 
transcript images which most closely approximates the test 
gene transcript image. 

30 14. A computer system for analyzing a library of 

biological sequences, said system including: 

means for receiving a set of transcript sequences, 
where each of the transcript sequences is indicative of a 
different one of the biological sequences of the library; 

35 and 

means for processing the transcript sequences in the 
computer system in which a database of reference transcript 



20 



25 



90 



wo 95/20681 



PCT/US95/01160 



sequences indicative of reference biological sequences is 
stored, wherein the computer is programmed with software 
for generating an identified sequence value for each of the. 
transcript sequences, where each said identified sequence 
value is indicative of a sequence annotation and a degree 
of match between a different one of the biological 
sequences of the library and at least one of the reference 
transcript sequences, and for processing each said 
identified sequence value to generate final data values 
indicative of a number of times each identified sequence 
value is present in the library. 

15. The system of claim 14, also including: 
library generation means for producing the library of 

biological sequences and generating said set of transcript 
15 sequences from said library. 

16. The system of claim 15, wherein the library 
generation means includes: 

means for obtaining a mixture of mRNA; 

means for making cDNA copies of the mRNA; 
20 means for inserting the cDNA copies into cells and 

permitting the cells to grow into clones; 

means for isolating a representative population of the 
clones and producing therefrom the library of biological 
sequences. 
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SYBASE database Structure 

Library Preparation 
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Aattp sequence toOowing Ser*** and ocan 
the dcTOh Aitflp lhal rfwvs iTCxnology with hlDE 
{14U To delete the compioto STH23 aec^«noe end 
create the«a23A:.-l/FM3iajtBtion. polyrorase chain 
fsaction (PGR) prinrws (S'-TCGGAAQACXTTCAT- 
TCTTGCTCVVTTTT&kTArrGCTC- TGT^TTG- 
TACTGW5AGTGCAC-3' : and 5'-<5CTACAAACAGC- 

gtcgacttgaatgccccgacatcttoqactgt- 

GaSGTATTTCACAOOG-aT wefO used to errxO^ 
the URA3 soqime of pRS3l6. and the reaction 
product was franstormed Irto yeast tor on»-step gene 
repbcement (R Rolhstein. Methods Emymoi. 194. 
281 n99l]).Tocreai»the«tflA.-.'LfLCfnutattancQn. 
taned on pi 14, e SjCM* Sd I irognwrt from pAXL / 
was clof«dlntopUCi9.arxJ^ internal 4iW<b Hps 
l-Xho t fragment was reptesd wRM a La/2 fragment 
To construct the sto23A:±BJ2 a»«*e (a deiaticn cor- 
respondng to 931 amino edds} carrtod on p153, e 
L£U2 ingmBrt was iflsd to repiace the 2.8-«> ftrt 
I-ECI136 1 fraynwH of S7EZ3, wMicn occws within a 
&2-»(b Hni n-B^ tl eerxvnic iragment carried on 
pSP72 (Promega). To create "T^sMFAJ, a 1.6-kb 
Bam HI fragment containir^ 1, ^om pKK16 |K. 
Kucrto-. R E St&TW, J, Thcmer. EMBOd 8. 3973 
(1989(1], was igated into tte Bam HI ^eofYEp351 p. 
E. Ha, A. M. Myers. T.J. Koemer. A Tzagoioff, Yeast 
Z 163(1986>)|. 

24. 0. Cham and L Herskowttz, CeO 65. 1203 (1991). 

25. B. W. Matthews. Ax: Ch&n, Res. 21. 333 (1966). 

26. K. Kuchler. H. G. Dohfman, J. Viom^i J.CeiBiU 
120. 1203 (1993); R Kofing and C. R HoOenbefg, 
B4aOJ, 13, 3261 (1994J; C. Berkowef, D. Ljoayza. 
S Michafifis, SM, Bic^. Cef 5. 1 185 (1994X 

27. A. Bander and J. R. Prv^. Atof^ Asd. ScL 
USA 86, 0976 (1989): J. Chant. K. Conado. J. R 
Pringte. (. Herskowitz, Cei 65. I2i3 (1991); S. 
Powers. & Gonzales. T. Christer^ea J- Cubert, O. 
Bfoek. p. 1225: K O. Park. J. Chant, I. Her- 
skowtlz. Nature 365, 269 (1993); J- Chant. Trends 

Genet 1 0. 328 (1 994); and J. R PringJe. J. 

CeSBiol. 129, 751 (1995); J. Chant M. Machke, E. 
MItchefl. L Herskowttz. J. R Pr^ngle. bid., p. 767. 

28. a F. Sprague Jr.. Methods. EmymU 104, 77 
(1991). 

29. Shgle-letter abbreviations for the amno add rest- 
duas are as loiiows: A. Ala; C. Cys: Q> Asp; E. GlLr, F. 
Phe: & Gly; H, His: I. Oe: K. Lys: L Leu: M. Met: N, 
Asn: P, Pro; O. Gh; R Arg; S, Ser; T.Thr; V, Val; W, 
Trp; and Y, Tyr. 

30. A VV303 1A derivaiwB. SY2SZ5 (MAfa tn3-1 leuS-3, 
I12trp1'1e(3g2-1 canUIOOsstIA frjl^-:RfSl'tacZ 
/its3A.':RiS 7 -HS3). was pmrit strain for the rrutant 
search. Sy2625 derivairwes far the mating assays, se- 
creted pheromone assays, en4 the pcisa-diase 
irrwnts hduded the lolowir^ sfreins: Y49 (sfa22-}). 
Y115 (mtoJAAiatg. Y142 ^UHJM3i, Y173 
i^t:d£UZi, Y220 tflxt1AJRA3 steZ3&.'iJM3i. Y221 
fsta23^-.im3), Y231 ^1t:,i£U2 Ste23A^LBJ2i, 
and Y233 lsra23A.-±a^ MATa derivatives of 
SY2625 tv:iuded the toOowlng strains: Y199 
(SY2625 made M47oJ, Y278 (tfe22-a Vl95 
(m/^fA:ia/?). Y196 fpxt1t::L£UZ^ and Y197 
IfijdUHJWi. The EG123 (MAU Ieu2ura3trpj ceni 
/»:s4) genetic tiacitgrTXjnd was ised to create a set of 
strains for analysis of bud site selection. EG1 23 dft- 
rtvatfues induded the folowtng strains: Y175 
(ajrf1A.7Lai2), Y223 tarf/:.tJW43), Y234 fsto23^ 
LBff^ and Y272 (sxffA.*:L£L!2 sta23&.':LEUZi. 
MATa dertv3t»ves of EG123 Incbjded the tdowng 
strains: Y214 (EG123 made MATa) and Y2S3 
(axnd.7LCl/2). AD strte were gerteratoa by means 
of standard ganetlc or moieciiar methods Involving 
the appropriate constructs (23). In particular, the ax/ r 
ste23 double mutant strains were oasied by cross- 
ing of the appropriate AMTa sfa23 and MATa w<f1 
mutants. foUowed by sporutati(» ctf the resuttant dip. 
bid and isolation of the doable mutant from nonpe- 
rental d*type tetrads. Gene dsnjptlons were oon- 
firmed with either PCH or Southern (DNA) analysis. 

31. p129ijaYEp352M.EHl. AM Myers, T.J. Ko- 
emer. A TzBgdoff. Yaasr 2. 1 63 (i 98C3| plasmid con- 
taining a 5.5-kb Sd I fragment of P^f.p 151 was 
derived from p 1 29 by irvertion of » irker at the Bgl B 
site wilhh AXL r . vOiic** led to an fr»-frame Inaenion of 
the henwggbtinln(KA) epitope (DOrPtUVPOYA) (25) ■ 
between »T*ioacid5 8&» and 855 of IheAJQLI prod- 



uct PC225 b a KS+ (Straiagene) plasmid contairing 
a 0.54* Bam HJ-Sst I fragment from p^WCL). Substi- 
Wion rruations of the proposed actJVB sue of Aadlp 
were created wfth the use of PC225 and site-specific 
nwlagenesis rwoMng appropriate synthetic ofigonu- 
Oeotidos ifi>dUH68A. S'-GTGCTCACAAAGCGCT- 
GCC^AACCGGC-3'; axfl-BTlA, 5*-AAGAATCAT- 
6TG0GCACAAAGGTGCGO3': aid exfl-eriD, 5'- 
AAGAATCATGTGATCACAAAG6TGCGC-3'). The 
mutations were confimod by sequence analysis. Af- 
ternrxitagenesis, the 0.4-kb Bam Hl-Msc I fragmffit 
from the mutagenized pC225 plasrrtds was trers- 
f erred into pA>a.T toaeateasetof pAS316pteyTids 
carrying difterent A«.r aletes. pl24 (fixtl-HeSA). 
piX ffixn-eriAi, and pl32 (a)tf;-E7;D). amIarV. a 
set of HA-taggod aSelee caniad on YEp352 were OT- 
aied after replacement Ol the pl5l B»n Ht-Msc I 
fragment to generate pi 61 1^l-ET1Ai, pl62 (axfl- 
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Quantitative Monitoring of Gene Expression 
Patterns with a Complementary DNA Microarray 

Mark Schena/ Dari Shalon.*t Ronald W. Davis, 
Patrick O. Brown* 

A high-capadty system was developed to monitor the expression of many genes In 
parallel. Microanrays prepared by high-speed robotic printing of complementary DNAs on 
glass were used for quantitative expression measurements of the corresponding genes 
Because of the small format and high density of the arrays, hybridization volumes of 2 
microliters could be used that enabled detection of rare transcripts In probe mbctures 
derived from 2 micrograms of total cellular messenger RNA. Differential expression 
measurements of 45 ArabtdopsJs genes were made by means of simultaneous, two-color 
fluorescence hybridization. 



The temporal, developrhencal, topographi- 
cal, histological, and physiological panems 
in which a gene is expressed provide clues to 
its biobgical role. The large and expanding 
database of complementary DNA (cDNA) 
sequences from many organisms (i) presents 
the opportunity of defining these patterns at 
the level of the whole genome. 

For these studies, we used the snull flow- 
ering plant Arafcidopstj xhaliana as a model 
organism. Arabidopsis possesses many ad- 
vantages for gene expression analysis, in- 
cluding the feet that it has the smallest 
geiusme of any higher eukaryoce examined 
to date (2). Forty-five cloned Arabidopsis 
cDNAs (Table 1). including 14 complete 
sequences and 31 expressed sequence tags 
(ESTs), were used as gene-specific targets. 
We obtained the ESTs by selecting cDNA 
clones at random from an Arabidopsis 
cDNA library. Sequence analysis revealed 
that 28 of the 31 ESTs matched sequences 
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in the database (Table I). Three additional 
cDNAs from other organisms served as con- 
trols in the experiments. 

The 48 cDNAs, averaging -1.0 kb, 
were amplified with the polymerase chain 
reaction (PCR) and deposited into indi- 
vidual wells of a 96-weIl microliter plate. 
Each sample was duplicated in two adja- 
cent wells to allow the reproducibility of 
the arraying and hybridization process to 
be tested. Samples from the microtiter 
plate were printed onto glass microscope 
slides in an area measuring 3.5 mm by 5.5 
mm with the use of a high-speed arraying 
machine (3). The arrays were processed by 
chemical and heat treatment to attach the 
DNA sequences to the glass surfoce and 
denature them (3). Three arrays, primed 
in a single lot, were used for the experi- 
ments here. A single microtiter plate of 
PCR products provides sufficient material 
to print at least 500 arrays. 

Ruorescent probes were prepared from 
total Araindopsis mRNA (4) by a single 
round of reverse transcription (5). The Ara- 
bidopsis mRNA was supplemented with hu- 
man acetylcholine receptor (AC^hR) mRNA 
at a dilution of 1 : 1 0.000 (w/w) before cDNA 
synthesis, to provide an internal starwiard for 
calibration (5). The resulting fluorescently 
labeled cDNA mixture was hybridized to an 
array at high stringency (6) and scaniied 
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with a laser (3). A high-sensidvity scan gave 
signals that saturated the detector at nearly 
all of the Arahidbfuis target sites (Fig. lA). 
Calibration relative to dhe AChR mRNA 
standard (Fig. lA) established a semitivicy 
limit of - 1 :50,000. No detectable hybridiza- 
tion was observed to either the rat glucocor- 
ticoid receptor (Fig, lA) or the yeast TRP4 
(Fig. lA) targets even at the highest scan- 
ning sexuitivity. A moderate-sensitivity' scan 



High sensitivity 
1 2 3 4 5 C 7 e 9 10 11 12 



> 1:3.000 1:10,000 1:50,000 >1:200 

Expression level (wiw) 



of the same array allowed linear detection of 
the more abundant transcripts (Fig. IB). 
Quantitation of both scaru revealed a range 
of expression levels spanning three orders of 
magnitude for the 45 genes tested (Table 2). 
RNA blots (7) for several genes (Fig. 2) 
corroborated the expression levels measured 
with die microarray to within a factor of 5 
(Table 2). 

Differential geive expression was investi- 
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gated with a simultaneous, two<obr hy- 
bridiiation scheme, which served to mini- 
mize experimental variation inherent in the 
comparison of independent hybridizations. 
Fluorescent probes were prepared from two 
mRNA sources with the use of reverse tran- 
scriptase in the presence of fluorescein- and 
lissaminc-Iabeled nucleotide aixalogs, re- 
spectively (5). The two probes were then 
mixed together in equal proportions, hy- . 
bridized to a single array, and scarmed sep- 
arately for fluorescein and lissamtne emis- 
sion after independent excitation of the two 
fluorophores (3). 

To test whether overexpression of a sin- 
gle gene could be detected in a pool of total 
Arobidopjis mRNA, we used a microanay to 
analyze a transgenic line ovcrcxpressing the 
single transcription factor HAT4 (8). Ruo- 
rcsccni probes representing mRNA from 
wild-type and HAT-^-transgenic plants were 
labeled with fluorescein and lissamine, re- 
spectively; the two probes were then mixed 
and hybridized to a single array. An intense 
hybridization signal was observed at the 
position of the HAT4 cDNA in the lissa- 
mine-specific scan (Fig. ID), but not in the 
fluorescein-specific scan of the same array 
(Fig. IC). Calibration widi AOiR mRNA 
added to the fluorescein and lissamine!^ 
cDNA synthesis reactions at dilutions of 
1:10,000 (Fig. IC) and 1:100 (Fig. ID),^ 
respectively, revealed a 50-fold elevation of J 
HAT4 mRNA in the transgenic line rela-.^ 
tive to its abundatKe in wild-rype plants c 
(Table 2). This magnitude of HAT4 over-r. . 
expression matched that inferred from the ^ 
Northern (RNA) arulysis within a factor of " 
2 (Fig. 2 and Table 2). Expressicm of all the 
other genes monitored on the array differed 
by less than a factor of 5 between HAT4- 
trarugenic and wild-type plants (Fig 1, C 



o o o 



<4> o 



WBdIyp* HATM 



C3 
1:10.000 



Fig. 1. Gene expression monttofBd wim the use of CDNA rito^^ 

psoudocxjtor correspond to hybridization Wonsllies. Color bars were cafibmted frofn the signal obtained 
with the use of known conoenlratioris of human AChR mRhW in irx^ 

letters on the axes mark the posJton of each cDNA. (A) Wgh-sensitivtty fJuorescein scan after hybridization 
with fKjorescein^abeled cDNA derived from wiid-type plants. (B) Same array as in (AJ but scanhed at 
moderate sensltMty. (C and D) A shgte aroy was probed with a 1 : 1 mixture of fluorBsceb-labeled cONA 
from wBd-type plants and Bssamine-iabeted cDNA from HAT4. transgenic plants. The single array was 
then scanned suocessivefy to detect the fluorescein fluorescence corresponding to mRNA from wikJ-type 
ptanis (Q and the lissamlne fluofescence corresponding to mRNA from HAT4-trans9er^c plants (D) fE 
and F) A single array was probed with a 1:1 mixture of ftuorescein-labeled cDNA from root tissue and 
lissamine-iabeled cONA from leaf tissue. The single array was then scanned successively to delect the 
fluorescein fUxescence corresponding to mRNAs expressed m roots (E) and the fissamine fluorescence 
corresponding to mHNAs expressed in leaves (F). 



CABl 




iu) 0.1 0.01 1.0 ai act 

mRNA dig) 



Human 
AChH 



20 2.0 OJt 
mRNA(ng} 

Rg. 2. Gene expression monitored with RNA 
(Northern) blot analysis. Designated omoutts of 
mRfMA from wiW-type and H474.transgenic 
plants w^re spotted onto nylon membranes and 
probed with the cONAs indicated. Purtfied hunan 
AChR mRNA was used for calibration. • 
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and D, and Tabic 2). Hybridization of flu- 
oresccin-labcled glucocorticoid receptor 
cDNA (Fig. IC) and lissaminc-labeled 
TRP4 cDNA (Fig. ID) verified the pres- 
ence of the negative control targets and the 
lack of optical cross talk between the two 
fluorophores. 

To explore a more complex alteration in 
expression panems, we performed a second 
two-color hybridiiation experiment with 
fluorescein* and lissaminc-tabcled probes 
prepared from root and leaf mRNA, respec- 
tively. The scanning sensitivities for the 
two fluorophorcs were normalized by 
matching the signals resulting from AChR 



mRNA, which was added to both cDNA 
synthcsU reactions at a dilution of 1:1(XX) 
(Fig. 1, E and F). A comparison of the scaru 
revealed widespread differences in gene ex- 
pression between root and leaf tissue (Fig. 1, 
E and F). The mRNA from the light-regu- 
lated CABl gene was -500-fold more abun- 
dant in leaf (Fig. IF) than in root tissue 
(Fig. IE). The expression of 26 other genes 
differed between root and leaf tissue by 
more than a factor of 5 (Fig. 1. E and F). 

The HAT^-transgenic line we examiticd 
has elongated hypocotyb, early flowering, 
poor germination, and altered pigmentation 
(8). Although changes in expression were 



Table 1. Sec^ences contained on the cONA microarTBy. Shown is the oosftion m*. irrv>«n r^-rtatK^ 
Kinctioaandtheaccesskxinunberofea^ 

:? ! sequencer, the database. N>5H?^edI^ orm^ rSJ^dfJd^ 

djnudeobde: ATPase. adenosine triphosphatase; GTP, guanosine triphosphate ^"^^"^ 



Positicxi 



cDNA 



Function 



Aocession 
number 



a1.2 

a3.4 

as. 6 

a7.8 

a9, 10 

an. 12 

b1.2 

b3.4 

b5.6 

b7.8 

b9.10 

b11,12 

c1,2 

c3, 4 

C5.6 

C7.8 

c9. 10 

C11,12 

d1.2 

d3.4 

d5.6 

d7.8 

d9, 10 

d11.12 

el, 2 

e3,4 

e5.6 

67, e 

eg. 10 
e11. 12 
fl.2 
13,4 
f5,6 
f7,8 
t9. 10 
111,12 

gi.2 

g3.4 

gs.6 

g7,8 

g9. 10 

gii, 12 

hi. 2 

h3.4 

h5,6 

h7.8 

h(9.10 

h11.12 



AChR 

ES73 

EST6 

AAC1 

EST12 

EST13 

CABl 

EST17 

GA4 

EST19 

GBF-1 

EST23 

EST29 

GBF-2 

EST34 

ES735 

EST41 

rGR 

EST42 

EST45 

HAT1 

EST46 

EST49 

HAT2 

HAT4 

EST50 

HATS 

EST51 

HAT22 

EST52 

EST59 

KNAT7 

EST60 

EST69 

PPH1 

EST70 

EST75 

EST7e 

flOC7 

EST82 

ESTB3 

EST84 

EST91 

EST96 

SARI 

EST100 

EST103 

7HP4 



Human AChR 
Actin 

NADH dehydrogenase 

Actini 

Unknown 

Actin 

Chtorophyll a/b bindhg 
Phosphogtycerate kinase 
Gibbereilic acid biosynthesis 
Un>TOwn 

G-box birxfing factor 1 
Elorigation factor 
Aldolase 

G-box birviing factor 2 
Chloroplast protease 
Unknown 
Catalase 

Rat glucocorticoid receptor 

Unknown 

ATPase 

Homeobox-teudne zipper 1 
Ught harvesting comptex 
UnkrKMTi 

Honrwobox-teudne zipper 2 
Homeobox-ieucifw zipper 4 
Phosphoribuiokirtase 
Homeobox-teudrw zipper 5 
Unknown 

Hofr»eobox-teucir>e zipper 22 
Oxygen evolving 
Uhknown 

>CrX3fred-like homeobox 1 
F^uBisCO small subunit 
Translaton elongation factor 
Protein phosphatase 1 
Unknown 

Chtoropiast protease 

Unknown 

CyctophHin 

GTP binding 

Unkrx3wn 

Unknown 

t>iknown 

Unknown 

Synaptobrevin 

Ught hanging complex 

Ughl han^sting complex 

Yeast tryptophan biosynthesis 



H36236 

227010 

1^0016 

U36594t 

T45783 

Mas 160 

T44490 

L37126 

U35595t 

X63894 

X52256 

T04477 

X63895 

R87034 

T14152 

T22720 

M14053 

U36596t 

J04185 

U09332 

TD4063 

t76267 

U09335 

M90394 

T04344 

Mg0416 

233675 

U09336 

T21749 

234607 

U14174 

XI 4564 

T42799 

U34803 

T44621 

T43698 

R65481 

LI 4844 

X59152 

233795 

T45278 . 

T13832 

R64816 

M90418 

218205 

X03909 

X04273 



observed for HAT4, large changes in ex- 
pression were not observed for any of the 
other 44 genes we examined, Th« waa 
somewhat surprising, particularly because 
comparative analysis of leaf and root tissue 
identified 27 diffcrenrially expressed genes. 
Analysis of an expanded set of genes may be 
required to identify genes whose expression 
changes upon HAT4 overexpression; alter- 
riatively, a comparison of mRNA popula- 
tions from specific tissues of wild-type and 
HAT4-transgcnlc plants may allow identi- 
fication of downstream genes. 

At the current density of robotic printixxg, 
it is feasible to scale die fabrication pixv- 
cess to produce atrayi coruaining 20,000 
cDNA targets. At this density, a single array 
would be sufficient to provide genc^pecific 
targets encompassir>g nearly the entire rep. 
ertoire of expressed genes in die Andxdopsii 
genome (2). The availability of 20,274 ESTs 
from Arafeidopjii (1,9) would provide a rich 
source of templates for such soidies. 

The estimated 100,000 genes in the hu- 
man genome (iO) exceeds die number of 
Arabidopsis genes by a factor of 5 (2). This 
modest increase in complexity suggesu chat 
similar cDNA microarrays, prepared from 
the rapidly growing repertoire of human 
ESTs (i). could be used to determine the 
expression patterns of tens of thousands of 
human genes in diverse cell types. Coupling 
an amplification strategy to the reverse 
trariscriprion reaction (Ji) could make it 
feasible to monitor expression even in 
minute tissue samples. A wide variety of 
acute and chronic physiological aruJ patho- 
logical conditions might lead to character- 
istic changes in the panems of gene expres- 
sion in peripheral blood cells or other easily 
sampled tissues. In concert wldi cDNA mi- 
croarrays for monitoring complex expres- 
sion patterns, these tissues might therefore 
serve as sensitive in vivo sensors for clinical 
diagnosis. M icroanays of cDN As could thus 
provide a useful link between human gene 
sequences and clinical medicine. 

Table 2. Gene axpression monitoring by fnicroar- 
ray and RNA blot analyses; tg, HAT^-transgenic 
See Tabte 1 tor additional gene irtomatioa Ex- 
pression levels (wAv) were cafibrated with the use 
of known arnounts of human AChR mRNA. Values 
for the microarray were determined from rricroar- 
ray scans (Fig. 1); values for the BNA biot were 
determined from RMA blots (Rg. 2). 



Gene 



Expression level (wAv) 



Microarray 



RNA blot 



'Propristanr saquenca of SUatagene (U Joto. CalttomQ). t No match in the database: no«l EsT 
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HAT4 
HAT4 (tg) 
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ftOC7 (tg) 



1:48 
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Gene Therapy in Peripheral Blood 
Lymphocytes and Bone Marrow for 
ADA" Immunodeficient Patients 

Clau(dio Borfjignon/ Luigi D, Notarangelo, Nadia Nobili, 
Giuliana Ferrari, Giulia Gasorati, Paola Panina. Evelina Mazzolari. 
Danieia Maggioni, Claudia Rossi, Paolo Servida, 
Alberto G. Ugazio, Fulvio Mavilio 

Adenosine deaminase (ADA) deficiency results in severe combined Immunodeficiency, 
tile first genetic disorder treated by gene therapy. Two different retroviral vectors were 
used to transfer ex vivo the human ADA minigene into bone manx5w cells and peripheral 
blood lymphocytes from two patients undergoing exogenous enzyme replacement ther- 
apy. After 2 years of treatment, long-term survival of T and B lymphocytes, marrow cells, 
and granulocytes expressing the transfen-ed ADA gene was demonstrated and resulted 
in normalization of the immune repertoire and restoration of cellular and humoral immunity. 
After discontinuation of treatment, T lymphocytes, derived from transduced peripheral 
blood lymphocytes, were progressively replaced by man-ow-derived T cells in both pa- 
tients. These results indicate successful gene transfer into long-lasting progenitor cells, 
producing a functional multilineage progeny. 



Severe combined immunodeficiency asso- 
ciated widx inherited deficiency of ADA 
(f ) is usually fatal unless affected children 
are kept in protective isolation or the im-. 
rmme system is reconstituted by bone mar- 
row transplantation from a human leuko- 
cyte antigen (HLAMdentical sibling donor 
(2). This is the therapy of choice, although 
it is available only for a minority of patients. 
In recent years, other forms of therapy have 
been developed, ir\cluding transplants from 
haploidentical donors (3, 4), exogenous en- 
lyme replacement (5), and somatic-cell 
gene dicrapy (6-9). 

We previously reported a preclinical mod- 
el in which ADA gene transfer and expression 
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menl d Pediatrics, University of Brescia Medc^ ScTwol. 
Bresda. Italy. 
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successfully restored inununc fuiKttons in hu- 
man ADA-deficicnt (ADA") peripheral 
blood lymphocytes (PBLs) in immunodefi- 
cient mice in vivo (JO, ii J. On the >«<i« of 
these preclinical results, the clinical applica- 
tion of gene therapy for the treatment of 
ADA" SCID (severe combined immunodefi- 
ciency disease) patients who previously ^led 
exogenous enzyme replacement therapy was 
approved by our Irwtitutional Ethical (Com- 
mittees and by the Italian National Commit- 
tee for Bioethics (12). In addition to evaluat- 
it\g the safety and efficacy of the gerw therapy 
procedure, the aim of the study was to deftne 
. the relative role of PBLs and hematopoietic 
stem cells in the long-term reconscitutioQ of 
immune functions after retroviral vector-me- 
diated ADA gene transfer. For this purpose, 
two structurally identical vectors expressing 
the hurruin ADA complemencaryr DNA 
(cDNA), distinguishable by the presence of 
alternative restriction sites in a nonfunctional 
region of the viral lor\g-terminal repeat 
(LTR). were used to traruducc PBLs aiul bone 
marrow (BM) cells independently. This pro- 
cedure allowed identification of d\c origin of 
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disclosed. Hie method involves dispensing 
- on the support under conditions effective 
designed to produce a microanay of such regions in an automated 
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Field o f the Inventj^en 
5 This invention relates to a method and apparatus 

for fabricating microarrays of biological samples for 
large scale screening assays, such as arrays of DNA 
samples to be used in DNA hybridization assays for 
genetic research and diagnostic applications. 

10 
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Baekoro nnd of the invantioTt 

A variety of methods are currently available for 
making arrays of biological macromolecules , such as 
arrays of nucleic acid molecules or proteins, one 
method for making ordered arrays of DNA on a porous 
membrane is a "dot blot" approach, in this method, a 
vacuum manifold transfers a plurality, e.g., 96, 
aqueous samples of DNA from 3 millimeter diameter wells 
15 to a porous membrane. A common variant of this 

procedure is a "slot-blot" method in which the wells 
have highly-elongated oval shapes. 

The DNA is immobilized on the porous membrane by 
baking the membrane or exposing it to UV radiation. 
This is a manual procedure practical for making one 
array at a time and usually limited to 96 samples per 
array. "Dot-blot" procedures are therefore inadequate 
for applications in which many thousand samples must be 
determined. 

A more efficient technique employed for making 
ordered arrays of genomic fragments uses an array of 
pins dipped into the wells, e.g., the 96 wells of a 
microtitxe plate, for transferring an array of samples 
to a siabstrate, such as a porous membrane. One array 
includes pins that are designed to spot a membrane in a 
staggered fashion, for creating an array of 9216 spots 
in a 22 X 22 cm area (Lehrach, et al., 1990). a 
limitation with this approach is that the volume of DNA 
spotted in each pixel of each array is highly variable. 
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In addition, the number of arrays that can be made with 
each dipping is usually quite small. 

An alternate method of creating ordered arrays of 
nucleic acid sequences is described by Pirrung, et al. 
5 (1992), and also by Fodor, et al. (1991). The method 
involves synthesizing different nucleic acid sequences 
at different discrete regions of a support. This 
method employs elaborate synthetic schemes, and is 
generally limited to relatively short nucleic acid 
10 sample, e.g., less than 20 bases. A related method has 
been described by Southern, et al. (1992). 

Khrapko, et al. (1991) describes a method of 
making an oligonucleotide matrix by spotting DNA onto a 
thin layer of polyacryleoaide. The spotting is done 
15 manually with a micropipette. 

None of the methods or devices described in the 
prior art eare designed for mass fabrication of 
microarrays chcuracterized by (i) a large number of 
micro-sized assay regions separated by a distance of 
20 50-200 microns or less, and (ii) a well-defined amount, 
typically in the picomole range, of analyte associated 
with each region of the array. 

Furthermore, cxirrent technology is directed at 
performing such assays one at a time to a single array 
25 of DNA molecules. For example, the most common method 
for performing DNA hybridizations to arrays spotted 
onto porous membrane involves sealing the membrane in a 
plastic bag (Maniatas, et al., 1989) or a rotating 
glass cylinder (Robbins Scientific) with the labeled 
30 hybridization probe inside the sealed chamber. For 
arrays made on non-porous surfaces, such as a 
microscope slide, each array is incubated with the 
labeled hybridization probe sealed under a coverslip. 
These techniques require a separate sealed chamber for 
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each array which makes the screening and handling of 
many such arrays inconvenient and time intensive. 

Abouzied, et al. (1994) describes a method of 
printing horizontal lines of antibodies on a 
nitrocellulose membrane and separating regions of the 
membrane with vertical stripes of a hydrophobic 
material. Each vertical stripe is then reacted with a 
different antigen and the reaction between the 
immobilized antibody and an antigen is detected using a 
standard ELISA colorimetric technique. Abouzied's 
technique makes it possible to screen many one- 
dimensional arrays simultaneously on a single sheet of 
nitrocellulose. Abouzied makes the nitrocellulose 
somewhat hydrophobic using a line drawn with PAP Pen 
(Research Products international) . However Abouzied 
does not describe a technology that is capable of 
completely sealing the pores of the nitrocellulose. The 
pores of the nitrocellulose are still physically open 
and so the assay reagents can leak through the 
hydrophobic barrier during extended high temperature 
incubations or in the presence of detergents which 
makes the Abouzied technique unacceptable for DNA 
hybridization assays. 

Porous membranes with printed patterns of 
hydrophilic/hydrophobic regions exist for applications 
such as ordered arrays of bacteria colonies. QA Life 
Sciences (San Diego CA) makes such a membrane with a 
grid pattern printed on it. However, this membrane has 
the same disadvantage as the Abouzied technique since 
reagents can still flow between the gridded arrays 
making them unusable for separate DNA hybridization 
assays . 

Pall Corporation make a 96-well plate with a 
porous filter heat sealed to the bottom of the plate. 
These plates are capable of containing different 
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reagents in each well without cross-contamination. 
However, each well is intended to hold only one target 
element whereas the invention described here makes a 
microarray of many biomolecules in each subdivided 
5 region of the solid support. Furthermore, the 96 well 
plates are at least 1 cm thick and prevent the use of 
the device for many color imetric, fluorescent and 
radioactive detection formats which require that the 
membrane lie flat against the detection surface. The 
10 invention described here requires no further processing 
after the assay step since the barriers elements are 
shallow and do not interfere with the detection step 
thereby greatly increasing convenience. 

Hyseq Corporation has described a method of making 
15 an "array of arrays" on a non-porous solid support for 
use with their sequencing by hybridization technique. 
The method described by Hyseq involves modifying the 
chemistry of the solid support material to form a 
hydrophobic grid pattern where each subdivided region 
20 contains a microarray of biomolecules. Hyseq 's flat 
hydrophobic pattern does not make use of physical 
blocking as an additional means of preventing cross 
conteuaination • 

25 aiiffiw*^ of the Invention 

The invention includes, in one aspect, a method of 
foirming a microarray of analyte-assay regions on a 
solid support, where each region in the array has a 
known amount of a selected, analyte-specif ic reagent. 

30 The method involves first loading a solution of a 
selected analyte-specif ic reagent in a reagent- 
dispensing device having an elongate capillary channel 
(i) formed by spaced-apart, coextensive elongate 
members, (ii) adapted to hold a quantity of the reagent 

35 solution and (iii) having a tip region at which aqueous 
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solution in the channel forms a meniscus. The channel 
is preferably formed by a pair of spaced-apart tapered 
elements. 

The tip of the dispensing device is tapped against 
5 a solid support at a defined position on the support 

surface with an impulse effective to break the meniscus 
in the capillary channel deposit a selected voliime of 
solution on the surface, preferably a selected volume 
in the range 0.01 to 100 nl. The two steps are 
10 repeated until the desired array is formed. 

The method may be practiced in forming a plurality 
of such arrays, where the solution-depositing step is 
are applied to a selected position on each of a 
plurality of solid supports at each repeat cycle. 
15 The dispensing device may be loaded with a new 

solution, by the steps of (i) dipping the capillary 
channel of the device in a wash solution, (ii) removing 
wash solution drawn into the capillary channel, and 
(iii) dipping the capillary channel into the new 
20 reagent solution. 

Also included in the invention is an automated 
apparatus for forming a microarray of analyte-assay 
regions on a plurality of solid supports, where each 
region in the array has a known amount of a selected, 
25 analyte-specific reagent. The app2U'atus has a holder 
for holding, at known positions, a plurality of planar 
supports, and a reagent dispensing device of the type 
described above. 

The apparatus further includes positioning 
30 structure for positioning the dispensing device at a 
selected array position with respect to a support in 
said holder, and dispensing structure for moving the 
dispensing device into tapping engagement against a 
support with a selected impulse effective to deposit a 
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selected volume on the support, e.g., a selected volume 
in the volume range 0.01 to 100 nl. 

The positioning and dispensing structures are 
controlled by a control unit in the apparatus. The 
5 unit operates to (i) place the dispensing device at a 
loading station, (ii) move the capillary channel in the 
device into a selected reagent at the loading station, 
to load the dispensing device with the reagent, and 
(iii) dispense the reagent at a defined array position 
10 on each of the supports on said holder. The tinit may 
further operate, at the end of a dispensing cycle, to 
wash the dispensing device by (i) placing the 
dispensing device at a washing station, (ii) moving the 
capillary channel in the device into a wash fluid, to 
15 load the dispensing device with the fluid, and (iii) 
remove the wash fluid prior to loading the dispensing 
device with a fresh selected reagent. 

The dispensing device in the apparatus may be one 
of a plurality of such devices which are carried on the 
20 arm for dispensing different analyte assay reagents at 
selected spaced surray positions. 

In another aspect, the invention includes a 
substrate with a surface having a microarray of at 
least 10^ distinct polynucleotide or polypeptide 
25 biopolymers in a surface area of less than about 1 cm^. 
Each distinct biopolymer (i) is disposed at a separate, 
defined position in said array, (ii) has a length of at 
least 50 subunits, and (iii) is present in a defined 
amount between about O.l femtomoles and 100 nanomoles. 
30 In one embodiment, the surface is glass slide 

surface coated with a polycationic polymer, such as 
poly lysine, and the biopolymers are polynucleotides. 
In another embodiment, the substrate has a water- 
impermeable backing, a water-permeable film formed on 
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the backing, and a grid formed on the film. The grid 
is composed of intersecting water- impervious grid 
elements extending from said backing to positions 
raised above the surface of said film, and partitions 
5 the film into a plurality of water-impervious cells. A 
biopolymer array is formed within each well. 

More generally, there is provided a substrate for 
use in detecting binding of labeled polynucleotides to 
one or more of a plurality different-sequence, 
10 immobilized polynucleotides. The substrate includes, 
in one aspect, a glass support, a coating of a 
polycationic polymer, such as poly lysine, on said 
surface of the support, and an array of distinct 
polynucleotides electrostatically bound lion-covalently 
15 to said coating, where each distinct biopolymer is 

disposed at a separate, defined position in a surface 
array of polynucleotides. 

In another aspect, the substrate includes a water- 
impermeable backing, a water-permeable film formed on 
20 the backing, and a grid formed on the film, where the 
grid is composed of intersecting water-impervious grid 
elements extending from the backing to positions raised 
above the surface of the film, forming a plurality of 
cells. A biopolymer array is formed within each cell. 
25 Also forming part of the invention is a method of 

detecting differential expression of each of a 
plxirality of genes in a first cell type, with respect 
to expression of the same genes in a second cell type. 
In practicing the method, there is first produced 
30 fluorescent-labeled cDNA's from mRNA's isolated from 
the two cells types, where the cDNA'S from the first 
and second cells are labeled with first and second 
different fluorescent reporters. 

A mixture of the labeled cDNA's from the two cell 
35 types is added to an array of polynucleotides 
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representing a plurality of knovn genes derived f r m 
the two cell types, under conditions that result in 
hybridization of the cDNA's to complementary-sequence 
polynucleotides in the array. The array is then 
5 examined by fluorescence under fluorescence excitation 
conditions in which (i) polynucleotides in the array 
that are hybridized predominantly to cDNA's derived 
from one of the first and second cell types give a 
distinct first or second fluorescence emission color, 

10 respectively, and (ii) polynucleotides in the array . 
that are hybridized to substantially equal numbers of 
CDNA's derived from the first and second cell types 
give a distinct combined fluorescence emission color, 
respectively. The relative expression of known genes 

15 in the two cell types can then be determined by the 
observed fluorescence emission color of each spot. 

These and other objects and features of the 
invention will become more fully apparent when the 
following detailed description of the invention is read 

20 in conjunction with the accompanying figures. 

Brief Description of the Dravinga 
Fig. 1 is a side view of a reage:nt-dispensing 
device having a open-capillary dispensing head 
25 constructed for use in one embodiment of the invention; 

Figs. 2A-2C illustrate steps in the delivery of a 
f ixed-volvune bead on a hydrophobic surface employing 
the dispensing head from Fig, 1, in accordance with one 
embodiment of the method of the invention; 
3^ Fig. 3 shows a portion of a two-dimensional array 

of analyte-assay regions constructed according to the 
method of the invention; 

Fig. 4 is a planar view showing components of an 
automated apparatus for forming arrays in accordance 
35 with the invention. 
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Fig. 5 Shows a fluorescent image of an actual 20 x 
20 array of 400 f luorescently-labeled DNA samples 
immobilized on a poly-l-lysine coated slide, where the 
total area covered by the 400 element array is 16 
5 square millimeters; . 

Fig. 6 is a fluorescent image of a 1,8 cm x i. 8 cm 
microarray containing lambda clones with yeast inserts, 
the fluorescent signal arising from the hybridization 
to the array with approximately half the yeast genome 
10 labeled with a green f luorophore and the other half 
with a red f luorophore; 

Fig. 7 shows the translation of the hybridization 
image of Fig. 6 into a karyotype of the yeast genome, 
where the elements of Fig. -6 microarray contain yeast 
15 DNA sequences that have been previously physically 
mapped in the yeast genome; 

Fig. 8 show a fluorescent image of a 0.5 cm x 0.5 
cm microarray of 24 cDNA clones, where the microarray 
was hybridized simultaneously with total cDNA from wild 
type Arabidopsis plant labeled with a green f luorophore 
and total cDNA from a transgenic Arabidopsis plant 
labeled with a red f luorophore, and the arrow points to 
the CDNA clone representing the gene introduced into 
the transgenic Arabidopsis plant; 
25 Fig. 9 shows a plan view of substrate having an 

array of cells formed by barrier elements in the form 
of a grid; 

Fig. 10 shows an enlarged plan view of one of the 
cells in the substrate in Fig. 9, showing an array of 
30 polynucleotide regions in the cell; 

Fig. 11 is an enlarged sectional view of the 
substrate in Fig. 9, taken along a section line in that 
figure; and 

Fig. 12 is a scanned image of a 3 cm x 3 cm 
35 nitrocellulose solid support containing four identical 
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arrays of M13 clones in each of four quadrants, where 
each quadrant was hybridized simultaneously to a 
different oligonucleotide using an open face 
hybridization method. 

5 

Detaile d Description of the Invention 

I- Definitions 

Unless indicated otherwise, the terms defined 
below have the following meanings: 
10 "Ligand" refers to one member of a ligand/anti- 

ligand binding pair. The ligand may be, for example, 
one of the nucleic acid strands in a complementary, 
hybridized nucleic acid duplex binding pair; an 
effector molecule in an effector /receptor binding pair; 
15 or an antigen in an antigen/ antibody or 
antigen/ antibody fragment binding pair. 

"Antiligand" refers to the opposite member of a 
ligand/anti-ligand binding pair. The antiligand may be 
the other of the nucleic acid strands in a 
20 complementary, hybridized nucleic acid duplex binding 
pair; the receptor molecule in an effector /receptor 
binding pair; or an antibody or antibody fragment 
molecule in antigen/ antibody or antigen/antibody 
fragment binding pair, respectively. 
25 "Analyte" or "analyte molecule" refers to a 

molecule, typically a macromolecule, such as a 
polynucleotide or polypeptide, whose presence, amount, 
and/ or identity are to be determined. The analyte is 
one member of a ligand/anti-ligand pair. 

"Analyte-specific assay reagent" refers to a 
molecule effective to bind specifically to an analyte 
molecule. The reagent is the opposite member of a 
ligand/anti-ligand binding pair. 

An "array of regions on a solid support" is a 
35 linear or two-dimensional array of pr ferably discrete 
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regi ns, each having a finite area, formed on the 
surface of a solid support. 

A "microarray" is an array of regions having a 
density of discrete regions of at least about lOO/cm', 
and preferably at least about lOOO/cm^. The regions in 
a microarray have typical dimensions, e.g., diameters, 
in the range of between about 10-250 fax, and are 
separated from other regions in the array by about the 
same distance. 

A support surface is "hydrophobic" if a aqueous- 
medium droplet applied to the surface does not spread 
out substantially beyond the area size of the applied 
droplet. That is, the surface acts to prevent 
spreading of the droplet applied to the surface by 
15 hydrophobic interaction with the droplet. 

A "meniscus" means a concave or convex surface 
that forms on the bottom of a liquid in a channel as a 
result of the surface tension of the liquid. 

"Distinct biopolymers" , as applied to the 
20 biopolymers forming a microarray, means an array member 
which is distinct from other array members on the basis 
of a different biopolymer sequence, and/ or different 
concentrations of the same or distinct biopolymers, 
and/or different mixtures of distinct or dif ferent- 
25 concentration biopolymers. Thus an array of "distinct 
polynucleotides" means an array containing, as its 
members, (i) distinct polynucleotides, which may have a 
defined amount in each member, (ii) different, graded 
concentrations of given-sequence polynucleotides, 
and/or (iii) different-composition mixtures of two or 
more distinct polynucleotides. 

"Cell type" means a cell from a given source, 
e.g., a tissue, or organ, or a cell in a given state of 
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differentiation, or a cell associated with a given 
pathology or genetic makeup. 

II- Method of Microarrav Formation 
5 This section describes a method of forming a 

microarray of analyte-assay regions on a solid support 
or substrate, where each region in the array has a 
known amount of a selected, analyte-specif ic reagent. 

Fig. 1 illustrates, in a partially schematic view, 
10 a reagent-dispensing device 10 useful in practicing the 
method. The device generally includes a reagent 
dispenser 12 having an elongate open capillary channel 
14 adapted to hold a quantity of the reagent solution, 
such as indicated at 16, as will be described below. 
15 The capillary channel is formed by a pair of spaced- 

apart, coextensive, elongate members 12a, I2b which are 
tapered toward one another and converge at a tip or tip 
region 18 at the lower end of the channel. More 
generally, the open channel is formed by at least two 
20 elongate, spaced-apart members adapted to hold a 

quantity of reagent solutions and having a tip region 
at which aqueous solution in the channel forms a 
meniscus, such as the concave meniscus illustrated at 
20 in Fig. 2A. The advantages of the open channel 
^5 construction of the dispenser are discussed below. 

With continued reference to Fig. 1, the dispenser 
device also includes structure for moving the dispenser 
rapidly toward and away from a support surface, for 
effecting deposition of a known amount of solution in 
30 the dispenser on a support, as will be described below 
with reference to Figs. 2A-2C. In the embodiment 
shown, this structure includes a solenoid 22 which is 
activatable to draw a solenoid piston 24 rapidly 
downwardly, then release the piston, e.g., under spring 
35 bias, to a normal, raised position, as shown. The 
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dispenser is carried on the piston by a connecting 
member 26, as shown. The just-described moving 
structure is also referred to herein as dispensing 
means for moving the dispenser into engagement with a 
5 solid support, for dispensing a known volume of fluid 
on the support. 

The dispensing device just described is carried on 
an arm 28 that may be moved either linearly or in an x- 
y plane to position the dispenser at a selected 
10 deposition position, as will be described. 

Figs. 2A-2C illustrate the method of depositing a 
known amount of reagent solution in the just-described 
dispenser on the surface of a solid support, such as 
the support indicated at 30. The support is a polymer, 
15 glass, or other solid-material support having a surface 
indicated at 31. 

In one general embodiment, the surface is a 
relatively hydrophilic, i.e., wettable surface, such as 
a surface having native, bound or covalently attached 
20 charged groups. On such sxirface described below is a 
glass surface having an absorbed layer of a 
polycationic polymer, such as poly-l-lysine. 

In another embodiment, the surface has or is 
formed to have a relatively hydrophobic character, 
25 i.e., one that causes aqueous medixim deposited on the 
surface to bead. A variety of known hydrophobic 
polymers, such as polystyrene, polypropylene, or 
polyethylene have desired hydrophobic properties, as do 
glass and a variety of lubricant or other hydrophobic 
30 films that may be applied to the support surface. 

Initially, the dispenser is loaded with a selected 
analyte-specif ic reagent solution, such as by dipping 
the dispenser tip, after washing, into a solution of 
the reagent, and allowing filling by capillary flow 
35 into the dispenser channel. The dispenser is now moved 
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to a selected position with respect to a support 
surface, placing the dispenser tip directly above the 
support-surface position at which the reagent is to be 
deposited. This movement takes place with the 
5 dispenser tip in its raised position, as seen in Fig. 
2A, where the tip is typically at least several 1-5 mm 
above the surface of the substrate. 

With the dispenser so positioned, solenoid 22 is 
now activated to cause the dispenser tip to move 
10 rapidly toward and away from the substrate surface, 
making momentary contact with the surface, in effect, 
tapping the tip of the dispenser against the support 
surface. The tapping movement of the tip against the 
surface acts to break the liquid meniscus in the tip 
15 channel, bringing the liquid in the tip into contact 
with the support surface. This, in turn, produces a 
flowing of the liquid into the capillary space between 
the tip and the surface, acting to draw liquid out of 
the dispenser channel, as seen in Pig. 2B. 
20 Fig. 2C shows flow of fluid from the tip onto the 

support sxirface, which in this case is a hydrophobic 
surface. The figure illustrates that liquid continues 
to flow from the dispenser onto the support surface 
^til it forms a liquid bead 32. At a given bead size, 
25 i.e., voliime, the tendency of liquid to flow onto the 
surface will be balanced by the hydrophobic surface 
interaction of the bead with the support surface, which 
acts to limit the total bead area on the surface, and 
by the surface tension of the droplet, which tends 
30 toward a given bead curvature. At this point, a given 
bead volume will have formed, and continued contact of 
the dispenser tip with the bead, as the dispenser tip 
is being withdrawn, wili have little or no effect on 
bead volume. 
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For liquid-dispensing on a more hydrophilic 
surface, the liquid will have less of a tendency to 
bead, and the dispensed volume will be more sensitive 
to the total dwell time of the dispenser tip in the 
5 immediate vicinity of the support surface, e.g., the 
positions illustrated in Figs. 2B and 2C. 

The desired deposition volume, i.e., bead volume, 
formed by this method is preferably in the range 2 pi 
(picoliters) to 2 nl (nanoliters) , although volumes as 
10 high as lOO nl or more may be dispensed. It will be 
appreciated that the selected dispensed volume will 
depend on (i) the "footprint" of the dispenser tip, 
i.e., the size of the area spanned by the tip, (ii) the 
hydrophobicity of the support surface, and (iii) the 
15 time of contact with and rate of withdrawal of the tip 
from the support surface. In addition, bead size may 
be reduced by increasing the viscosity of the medium, 
effectively reducing the flow time of liquid from the 
dispenser onto the support surface. The drop size may 
20 be further constrained by depositing the drop in a 
hydrophilic region surrounded by a hydrophobic grid 
pattern on the support surface. 

In a typical embodiment, the dispenser tip is 
tapped rapidly against the support surface, with a 
25 total residence time in contact with the support of 
less than about 1 msec, and a rate of upward travel 
from the surface of about 10 cm/ sec. 

Assuming that the bead that forms on contact with 
the surface is a hemispherical bead, with a diameter 
30 approximately equal to the width of the dispenser tip, 
as shown in Fig. 2C, the volume of the bead formed in 
relation to dispenser tip width (d) is given in Table l 
below. As seen, the volume of the bead ranges between 
2 pi to 2 nl as the width size is increased from about 
35 20 to 200 im. 
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Tabl 1 



d 


Volume (nl) 


20 /iin 


2 X 10'^ 


50 /iin 


3.1 X 10-^ 


100 ^m 


2.5 X 10-' 


200 


2 



35 



At a given tip size, bead voltime can be reduced in 
a controlled fashion by increasing surface 
hydrophobicity, reducing time of contact of the tip 
with the surface, increasing rate of movement of the 
tip away from the surface, and/ or increasing the 
viscosity of the medium. Once these parameters are 
fixed, a selected deposition volume in the desired pi 
to nl range can be achieved in a repeatable fashion. 

After depositing a bead at one selected location 
on a support, the tip is typically moved to a 
corresponding position on a second support, a droplet 
is deposited at that position, and this process is 
repeated until a liquid droplet of the reagent has been 
deposited at a selected position on each of a plurality 
of supports. 

The tip is then washed to remove the reagent 
liquid, filled with another reagent liquid and this 
reagent is now deposited at each another array position 
on each of the supports. In one embodiment, the tip is 
washed and refilled by the steps of (i) dipping the 
capillary channel of the device in a wash solution, 
(ii) removing wash solution drawn into the capillary 
channel, and (iii) dipping the capillary channel into 
the new reagent solution. 

From the foregoing, it will be appreciated that 
the tweezers-like, open-capillary dispenser tip 
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proviides the advantages that (i) the open channel of 
the tip facilitates rapid, efficient washing and drying 
before reloading the tip with a new reagent, (ii) 
passive capillary action can load the sample directly 
5 from a standard microwell plate while retaining 

sufficient sample in the open capillary reservoir for 
the printing of numerous arrays, (iii) open capillaries 
are less prone to clogging than closed capillaries, and 
(iv) open capillaries do not require a perfectly faced 
10 bottom surface for fluid delivery. 

A portion of a microarray 36 formed on the surface 
38 of a solid support 40 in accordance with the method 
just described is shown in Fig. 3. The array is formed 
of a plurality of analyte-specif ic reagent regions, 
15 such as regions 42, where each region may include a 
different analyte-specif ic reagent. As indicated 
above, the diameter of each region is preferably 
between about 20-200 /xm. The spacing between each 
region and its closest (non-diagonal) neighbor^ 
20 measured from center-to-center (indicated at 44), is 

preferably in the range of about 20-400 /im. Thus, for 
example, an array having a center-to-center spacing of 
about 250 Mm contains about 40 regions /cm or 1,600 
regions/ cm^. After formation of the array, the support 
25 is treated to evaporate the liquid of the droplet 

forming each region, to leave a desired array of dried, 
relatively flat regions. This drying may be done by 
heating or under vacuum. 

In some cases, it is desired to first rehydrate 
30 the droplets containing the analyte reagents to allow 
for more time for adsorption to the solid support. It 
is also possible to spot out the analyte reagents in a 
humid environment so that droplets do not dry until the 
arraying operation is complete. 
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Autoinated Apparatus for Fo rming Ar-rp^y <^ 
In another aspect, the invention includes an 
automated apparatus for forming an array of analyte- 
assay regions on a solid support, where each region in 
5 the array has a known amount of a selected, analyte- 
specific reagent - 

The apparatus is shown in planar, and partially 
schematic view in Fig. 4. A dispenser device 72- in the 
apparatus has the basic construction described above 
10 with respect to Fig. 1, and includes a dispenser 74 

having an open-capillary channel terminating at a tip, 
substantially as shown in Figs. 1 and 2A-2C. 

The dispenser is mounted in the device for 
movement toward and away from a dispensing position at 
15 which the tip of the dispenser taps a support surface, 
to dispense a selected volume of reagent solution, as 
described above. This movement is effected by a 
solenoid 76 as described above. Solenoid 76 is xinder 
the control of a control tmit 77 whose operation will 
20 be described below. The solenoid is also referred to 
herein as dispensing means for moving the device into 
tapping engagement with a support, when the device is 
positioned at a defined array position with respect to 
that support. 

25 The dispenser device is carried on an arm 74 which 

is threadedly mounted on a worm screw 80 driven 
(rotated) in a desired direction by a stepper motor 82 
also under the control of unit 77. At its left end in 
the figvire screw 80 is carried in a sleeve 84 for 

30 rotation about the screw axis. At its other end, the 
screw is mounted to the drive shaft of the stepper 
motor, which in turn is carried on a sleeve 86. The 
dispenser device, worm screw, the two sleeves mounting 
the worm screw, and the stepper motor used in moving 

35 the device in the "x" (horizontal) direction in the 
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figure form what is referred to here collectively as a 
displacement assembly 86. 

The displacement assembly is constructed to 
produce precise, micro-range movement in the direction 
5 of the screw, i.e., along an x axis in the figure. In 
one mode, the assembly functions to move the dispenser 
in X-axis increments having a selected distance in the 
range 5-25 /xm. In another mode, the dispenser unit may 
be moved in precise x-axis increments of several 
10 microns or more,; for positioning the dispenser at 

associated positions on adjacent supports, as will be 
described below. 

The displacement assembly, in turn, is mounted for 
movement in the "y" (vertical) axis of the figure, for 
15 positioning the dispenser at a selected y axis 

position. The structure mounting the assembly includes 
a fixed rod 88 mounted rigidly between a pair of freune 
bars 90, 92, and a worm screw 94 mounted for rotation 
between a pair of frame bars 96, 98. The worm screw is 
20 driven (rotated) by a stepper motor 100 which operates 
under the control of unit 77. The motor is mounted on 
bar 96, as shown. 

The structure just described, including worm screw 
94 and motor 100, is constructed to produce precise, 
25 micro-range movement in the direction of the screw/ 
i.e., along an y axis in the figure. As above, the 
structure functions in one mode to move the dispenser 
in y-axis increments having a selected distance in the 
range 5-250 /xm, and in a second mode, to move the 
30 dispenser in precise y-axis increments of several 

microns (Mm) or more, for positioning the dispenser at 
associated positions on adjacent supports. 

The displacement assembly and structure for moving 
this assembly in the y axis are referred to herein 
35 collectively as positioning means for positioning the 
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dispensing device at a selected array position with 
respect to a support. 

A holder 102 in the apparatus functions to hold a 
plurality of supports, such as supports 104 on which 
5 the microarrays of regent regions are to be formed by 
the apparatus. The holder provides a number of 
recessed slots, such as slot 106, which receive the 
supports, and position them at precise selected 
positions with respect to the frame bars on which the 

10 dispenser moving means is mounted. 

As noted above, the control unit in the device 
functions to actuate the two stepper motors and 
dispenser solenoid in a sequence designed for automated 
operation of the apparatus in forming a selected 

15 microarray of reagent regions on each of a plxirality of 
supports. 

The control unit is constructed, according to 
conventional microprocessor control principles, to 
provide appropriate signals to each of the solenoid and 

20 each of the stepper motors, in a given timed sequence 
and for appropriate signalling time. The construction 
of the xanit, and the settings that are selected by the 
user to achieve a desired array pattern, will be 
xinderstood from the following description of a typical 

25 apparatus operation. 

Initially, one or more supports are placed in one 
or more slots in the holder. The dispenser is then 
moved to a position directly above a well (not shown) 
containing a solution of the first reagent to be 

30 dispensed on the support (s). The dispenser solenoid is 
actuated now to lower the dispenser tip into this well, 
causing the capillary channel in the dispenser to fill. 
Motors 82, 100 are now actuated to position the 
dispenser at a selected array position at the first of 

35 the supports. Solenoid actuati n of the dispenser is 
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then effective to dispense a selected-volume droplet of 
that reagent at this location. As noted above, this 
operation is effective to dispense a selected volume 
preferably between 2 pi and 2 nl of the reagent 
5 solution. 

The dispenser is now moved to the corresponding 
position at an adjacent support and a similar volume of 
the solution is dispensed at this position. The 
process is repeated until the reagent has been 
10 dispensed at this preselected corresponding position on 
each of the supports. 

Where it is desired to dispense a single reagent 
at more than two array positions on a support, the 
dispenser may be moved to different array positions at 
15 each support, before moving the dispenser to a new 
support, or solution can be dispensed at individual 
positions on each support, at one selected position, 
then the cycle repeated for each new array position. 
To dispense the next reagent, the dispenser is 
20 positioned over a wash solution (not shown) , and the 
dispenser tip is dipped in and out of this solution 
until the reagent solution has been substantially 
washed from the tip. Solution can be removed from the 
tip, after each dipping, by vacuum, compressed air 
25 spray, sponge, or the like. 

The dispenser tip is now dipped in a second 
reagent well, and the filled tip is moved to a second 
selected array position in the first support. The 
process of dispensing reagent at each of the 
30 corresponding second-array positions is then carried as 
above. This process is repeated until an entire 
microarray of reagent solutions on each of the supports 
has been formed. 



35 IV. Microarray Substrate 
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This section describes embodiments of a substrate 
having a microarray of biological polymers carried on 
the substrate surface. Subsection A describes a multi- 
cell substrate, each cell of which contains a 
microarray, and preferably an identical microarray, of 
distinct biopolymers, such as distinct polynucleotides, 
formed on a porous surface. Subsection B describes a 
microarray of distinct polynucleotides bound on a glass 
slide coated with a polycationic polymer. 



A. Multi-Cell Substrate 

Pig. 9 illustrates, in plan view, a substrate 110 
constructed according to the invention. The substrate 
has an 8 X 12 rectangular array 112 of cells, such as 
cells 114, 116, formed on the substrate surface. With 
reference to Pig. lo, each cell, such as cell 114, in 
turn supports a microarray lis of distinct biopolymers, 
such as polypeptides or polynucleotides at known, 
addressable regions of the microarray. Two such 
regions forming the microarray are indicated at 120, 
and correspond to regions, such as regions 42, forming 
the microarray of distinct biopolymers shown in Fig. 3. 

The 96-cell array shown in Pig. 9 has typically 
array dimensions between about 12 and 244 mm in width 
and 8 and 400 mm in length, with the cells in the array 
having width and length dimension of 1/12 and 1/8 the 
array width and length dimensions, respectively, i.e., 
between about 1 and 20 in width and 1 and 50 mm in 
length. 

The construction of substrate is shown cross- 
sectionally in Fig. 11, which is an enlarged sectional 
view tcJcen along view line 124 in Fig. 9. The 
substrate includes a water-impermeable backing 126, 
such as a glass slide or rigid polymer sheet. Formed 
35 on the surface of the backing is a water-permeable film 
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128. The film is formed of a porous membrane material, 
such as nitrocellulose membrane, or a porous web 
material, such as a nylon, polypropylene, or PVDF 
porous polymer material • The thickness of the film is 
5 preferably between about 10 and 1000 /xm* The film may 
be applied to the backing by spraying or coating 
uncured material on the backing, or by applying a 
preformed membrane to the backing. The backing and 
film may be obtained as a preformed unit from 
10 commercial source, e.g., a plastic-backed 

nitrocellulose film available from Schleicher and 
Schuell Corporation. 

With continued reference to Fig. 11, the film- 
* covered surface in the substrate is partitioned into a 

15 desired array of cells by water-impermeable grid lines, 
such as lines 130, 132, which have infiltrated the film 
down to the level of the backing, and extend above the 
surface of the film as shown, typically a distance of 
100 to 2000 ^m above the film surface. 

20 The grid lines are formed on the substrate by 

laying down an uncured or otherwise f lowable resin or 
elastomer solution in an array grid, allowing the 
material to infiltrate the porous film down to the 
backing, then curing or otherwise hardening the grid 

25 lines to form the cell-array substrate. 

One preferred material for the grid is a f lowable 
silicone available from Loctite Corporation. The 
barrier material can be extruded through a narrow 
syringe (e.g., 22 gauge) using air pressure or 

30 mechanical pressure. The syringe is moved relative to 
the solid support to print the barrier elements as a 
grid pattern. The extruded bead of silicone wicks into 
the pores of the solid support and cures to form a 
shallow waterproof barrier separating the regions of 

35 the solid support. 
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In alternative embodiments, the barrier element 
can be a wax-based material or a thermoset material 
such as epoxy- The barrier material can also be a UV- 
curing polymer which is exposed to UV light after being 
printed onto the solid support. The barrier material 
may also be applied to the solid support using printing 
techniques such as silk-screen printing. The barrier 
material may also be a heat-seal stamping of the porous 
solid support which seals its pores and forms a water- 
impervious barrier element. The barrier material may 
also be a shallow grid which is laminated or otherwise 
adhered to the solid support. 

In addition to plastic-backed nitrocellulose, the 
solid support can be virtually any porous membrane with 
15 or without a non-porous backing. Such membranes are 
readily available from numerous vendors and are made 
from nylon, PVDF, polysulfone and the like. In an 
alternative embodiment, the barrier element may also be 
used to adhere the porous membrane to a non-porous 
20 backing in addition to functioning as a barrier to 
prevent cross contamination of the assay reagents. 

In an alternative embodiment, the solid support 
can be of a non-porous material. The barrier can be 
printed either before or after the microarray of 
25 biomolecules is printed on the solid support. 

As can be appreciated, the cells formed by the 
grid lines and the underlying backing are water- 
impermeable, having side barriers projecting above the 
porous film in the cells. Thus, def ined-voliame samples 
30 can be placed in each well without risk of cross- 
contamination with sample material in adjacent cells. 
In Fig. 11, defined volumes samples, such as sample 
134, are shown in the cells. ^ 

As noted above, each well contains a microarray of 
35 distinct biopolymers. In one general embodiment, the 
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microarrays in the well are identical arrays of 
distinct biopolymers, e.g., different sequence 
polynucleotides. Such arrays can be formed in 
accordance with the methods described in Section II, by 
5 depositing a first selected polynucleotide at the same 
selected microarray position in each of the cells, then 
depositing a second polynucleotide at a different 
microarray position in each well, and so on until a 
complete, identical microarray is formed in each cell. 

^0 In a preferred embodiment, each microarray 

contains about 10^ distinct polynucleotide or 
polypeptide biopolymers per surface area of less than 
about 1 cm^. Also in a preferred embodiment, the 
biopolymers in each microarray region are present in a 

15 defined amount between about 0.1 femtomoles and 100 

nanomoles. The ability to form high-density arrays of 
biopolymers, where each region is formed of a well- 
defined cuiiount of deposited material, can be achieved 
in accordsmce with the microarray-f orming method 

20 described in Section II. 

Also in a preferred embodiments, the biopolymers 
are polynucleotides having lengths of at least about 50 
bp, i.e., substantially longer than oligonucleotides 
which can be formed in high-density arrays by schemes 

25 involving parallel, step-wise polymer synthesis on the 
array sxirface. 

In the case of a polynucleotide array, in an assay 
procedure, a small volxome of the labeled DNA probe 
mixture in a standard hybridization solution is loaded 

30 onto each cell. The solution will spread to cover the 
entire microarray and stop at the barrier elements. 
The solid support is then incubated in a humid chamber 
at the appropriate temperature as required by the 
assay. 
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Each assay may be conducted in an "open-face" 
format where no further sealing step is required, since 
the hybridization solution will be kept properly 
hydrated by the water vapor in the humid chamber. At 
5 the conclusion of the incubation step, the entire solid 
support containing the numerous microarrays is rinsed 
quickly enough to dilute the assay reagents so that no 
significant cross contamination occurs. The entire 
solid support is then reacted with detection reagents 

10 if needed and analyzed using standard color imetric, 
radioactive or fluorescent detection means. All 
processing and detection steps are performed 
simultaneously to all of the microarrays on the solid 
support ensuring uniform assay conditions for all of 

15 the microarrays on the solid support. 

Glass-Slide Polynucleotide Array 
Fig. 5 shows a substrate 136 formed according to 
another aspect of the invention, and intended for use 

20 in detecting binding of labeled polynucleotides to one 
or more of a plurality distinct polynucleotides. The 
substrate includes a glass substrate 138 having formed 
on its surface, a coating of a polycat ionic polymer, 
preferably a cationic polypeptide, such as poly lysine 

25 or polyarginine. Formed on the polycat ionic coating is 
a microarray 140 of distinct polynucleotides, each 
localized at known selected array regions, such as 
regions 142. 

The slide is coated by placing a uniform- thickness 
30 film of a polycationic polymer, e.g., poly-l-lysine, on 
the surface of a slide and drying the film to form a 
dried coating. The amount of polycationic polymer 
added is sufficient to form at least a monolayer of 
polymers on the glass surface. The polymer film is 
35 bound to surface via electrostatic binding between 
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negative silyl-OH groups on the surface and charged 
amine groups in the polymers. Poly-l-lysine coated 
glass slides may be obtained commercially, e.g., from 
Sigma Chemical Co. (St. Louis, MO) . 
5 To form the microarray, defined volumes of 

distinct polynucleotides are deposited on the polymer- 
coated slide, as described in Section II. According to 
an important feature of the substrate, the deposited 
polynucleotides remain bound to the coated slide 
10 surface non-covalently when an aqueous DNA scunple is 
applied to the substrate under conditions which allow 
hybridization of reporter-labeled polynucleotides in 
the sample to complementary-sequence (single-stranded) 
polynucleotides in the substrate array. The method is 
15 illustrated in Examples 1 and 2. 

To illustrate this feature, a substrate of the 
type just described, but having an array of same- 
sequence polynucleotides, was mixed with fluorescent- 
labeled complementary DNA under hybridization 
conditions. After washing to remove non-hybridized 
material, the substrate was examined by low-power 
fluorescence microscopy. The array can be visualized 
by the relatively uniform labeling pattern of the array 
regions . 

In a preferred embodiment, each microarray 
contains at least 10^ distinct polynucleotide or 
polypeptide biopolymers per surface area of less than 
about 1 cm^. in the embodiment shown in Fig. 5, the 
microarray contains 400 regions in an area of about 16 
mm^, or 2.5 X lO^ regions/cm^. Also in a preferred 
embodiment, the polynucleotides in the each microarray 
region are present in a defined amount between about 
0.1 femtomoles and 100 nanomoles in the case of 
polynucleotides. As above, the ability to form high- 
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density arrays of this type, where each region is 
f rined of a well-defined amount of deposited material, 
can be achieved in accordance with the microarray- 
forming method described in Section II. 
5 Also in a preferred embodiments, the 

polynucleotides have lengths of at least about 50 bp, 
i.e., substantially longer than oligonucleotides which 
can be formed in high-density arrays by various in situ 
synthesis schemes. 

10 

V. Utility 

Microarrays of immobilized nucleic acid sequences 
prepared in accordance with the invention can be used 
for large scale hybridization assays in nxanerous 
15 genetic applications, including genetic and physical 

mapping of genomes, monitoring of gene expression, DNA 
sequencing, genetic diagnosis, genotyping of organisms, 
and distribution of DNA reagents to researchers. 

For gene mapping, a gene or a cloned DNA fragment 

20 is hybridized to an ordered array of DNA fragments, and 
the identity of the DNA elements applied to the array 
is unambiguously established by the pixel or pattern of 
pixels of the array that are detected. One application 
of such arrays for creating a genetic map is described 

25 by Nelson, et al. (1993). In constructing physical 
maps of the genome, arrays of immobilized cloned DNA 
fragments are hybridized with other cloned DNA 
fragments to establish whether the cloned fragments in 
the probe mixtxire overlap and are therefore contiguous 

30 to the immobilized clones on the array. For example, 
Lehrach, et al., describe such a process. 

The arrays of immobilized DNA fragments may also 
be used for genetic diagnostics. To illustrate, an 
array containing multiple forms of a mutated gene or 

35 genes can be probed with a labeled mixture of a 
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patient's DNA which will preferentially interact with 
only one of the immobilized versions of the gene. 

The detection of this interaction can lead to a 
medical diagnosis. Arrays of immobilized DNA fragments 
5 can also be used in DNA probe diagnostics. For 

example, the identity of a pathogenic microorganism can 
be established unambiguously by hybridizing a sample of 
the unknown pathogen's DNA to an array containing many 
types of known pathogenic DNA. A similar technique can 

10 also be used for janambiguous genotyping of any 

organism. Other molecules of genetic interest, such as 
cDNA's and RNA's can be immobilized on the array or 
alternately used as the labeled probe mixture that is 
applied to the array. 

^5 In one application, an array of cDNA clones 

representing genes is hybridized with total cDNA from 
an organism to monitor gene expression for research or 
diagnostic pxirposes. Labeling total cDNA from a normal 
cell with one color f luorophore and total cDNA from a 

20 diseased cell with another color f luorophore and 

simultaneously hybridizing the two cDNA samples to the 
same array of cDNA clones allows for differential gene 
expression to be measured as the ratio of the two 
f luorophore intensities. This two-color experiment can 

25 be used to monitor gene expression in different tissue 
types, disease states, response to drugs, or response 
to environmental factors. & An example of this approach 
is illustrated in Examples 2, described with respect to 
Fig, 8. 

30 By way of example and without implying a 

limitation of scope, such a procedure could be used to 
simultaneously screen many patients against all known 
mutations in a disease gene. This invention could be 
used in the form of, for example, 96 identical 0.9 cm x 

35 2.2 cm microarrays fabricated on a single 12 cm x 18 cm 
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Sheet of plastic-backed nitrocellulose where each 
microarray could contain, for example, 100 DNA 
fragments representing all known mutations of a given 
gene. The region of interest from each of the DNA 
5 samples from 96 patients could be amplified, labeled, 
and hybridized to the 96 individual arrays with each 
assay performed in 100 microliters of hybridization 
solution. The approximately 1 thick silicone rubber 
barrier elements between individual arrays prevent 
10 cross contamination of the patient samples by sealing 
the pores of the nitrocellulose and by acting as a 
physical barrier between each microarray. The solid 
support containing all 96 microarrays assayed with the 
96 patient samples is incubated, rinsed, detected and 
15 analyzed as a single sheet of material using standard 
radioactive, fluorescent, or color imetric detection 
means (Maniatas, et al., 1989). Previously, such a 
procedure would involve the handling, processing and 
tracking of 96 separate membranes in 96 separate sealed 
20 chambers. By processing all 96 arrays as a single 

sheet of material, significant time and cost savings 
are possible. 

The assay format can be reversed where the patient 
or organism's DNA is immobilized as the array elements 
25 and each array is hybridized with a different mutated 
allele or genetic marker. The gridded solid support 
can also be used for parallel non-DNA ELISA assays. 
Fxirthermore, the invention allows for the use of all 
standard detection methods without the need to remove 
30 the shallow barrier elements to carry out the detection 
step. 

In addition to the genetic applications listed 
above, arrays of whole cells, peptides, enzymes, 
antibodies , antigens , receptors , 1 igands , 
35 phospholipids, polymers, drug cogener preparations or 
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Chemical substances can be fabricated by the means 
described in this invention for large scale screening 
assays in medical diagnostics, drug discovery, 
molecular biology, immunology and toxicology. 
5 The multi-cell substrate aspect of the invention 

allows for the rapid and convenient screening of many 
DNA probes against many ordered arrays of DNA 
fragments. This eliminates the need to handle and 
detect many individual arrays for performing mass 
10 screenings for genetic research and diagnostic 

applications. Numerous microarrays can be fabricated 
on the same solid support and each microarray reacted 
with a different DNA probe while the solid support is 
processed as a single sheet of material. 

15 

The following examples illustrate, but in no way 
are intended to limit, the present invention. 

Example 1 

20 Genomic-Complexit v Hvbridization to Micyo 

DNA Arr avs Representing the Yeast 
Saccha romyces cBrBvlsiae Genome with 
Two-Color Fl uorescent Detection 

The array elements were randomly amplified PGR 

25 (Bohlander, et al., 1992) products using physically 

mapped lambda clones of S. cerevisiam genomic DNA 

templates (Riles, et al., 1993). The PGR was performed 

directly on the lambda phage lysates resulting in an 

amplification of both the 35 kb lambda vector and the 

30 5-15 kb yeast insert sequences in the form of a uniform 

distribution of PGR product between 250-1500 base pairs 

in length. The PGR product was purified using 

Sephadex G50 gel filtration (Pharmacia, Piscataway, NJ) 

and concentrated by evaporation to dryness at room 

35 temperature overnight. Each of the 864 amplified 
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lambda clones was rehydrated in 15 ^1 of 3 x SSC in 
preparation for spotting onto the glass. 

The micro arrays were fabricated on microscope 
slides which were coated with a layer of poly-l-lysine 
5 (Sigma) . The automated apparatus described in Section 
IV loaded 1 ^1 of the concentrated lambda clone PGR 
product in 3 X ssc directly from 96 well storage plates 
into the open capillary printing element and deposited 
-5 nl of sample per slide at 380 micron spacing between 
10 spots, on each of 40 slides. The process was repeated 
for all 864 samples and 8 control spots. After the 
spotting operation was complete, the slides were 
rehydrated in a htuaid chamber for 2 homrs, baked in a 
dry 80 » vacuum oven for 2 hours, rinsed to remove un- 
15 absorbed DNA and then treated with succinic anhydride 
to reduce non-specific adsorption of the labeled 
hybridization probe to the poly-l-lysine coated glass 
surface. Immediately prior to use, the immobilized DNA 
on the array was denatured in distilled water at 90 « 
20 for 2 minutes. 

For the pooled chromosome experiment, the 16 
chromosomes of Saccharomyces cerevisiae were separated 
in a CHEF agarose gel apparatus (Biorad, Richmond, OA) . 
The six largest chromosomes were isolated in one gel 
25 slice euid the smallest 10 chromosomes in a second gel 
slice. The DNA was recovered using a gel extraction 
kit (Qiagen, Chatsworth, CA) . The two chromosome pools 
were randomly amplified in a manner similar to that 
used for the target lambda clones. Following 
30 amplification, 5 microgreuns of each of the amplified 

chromosome pools were separately random-primer labeled 
using Klenow polymerase (Amersham, Arlington Heights, 
IL) with a lissamine conjugated nucleotide analog 
(Dupont NEN, Boston, MA) for the pool containing the 
35 six largest chromosomes, and with a fluorescein 
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conjugated nucle tide analog (BMB) for the pool 
containing smallest ten chromosomes. The two pools 
were mixed and concentrated using an ultrafiltration 
-device (Amicon, Danvers, MA). 
5 Five micrograms of the hybridization probe 

consisting of both chromosome pools in 7.5 /zl of TE was 
denatured in a boiling water bath and then snap cooled 
on ice. 2.5 fil of concentrated hybridization solution 
(5 X SSC and 0.1% SDS) was added and all 10 ^1 
10 transferred to the array surface, covered with a cover 
slip, placed in a custom-built single-slide humidity 
chamber and incubated at 60 » for 12 hours. The slides 
were then rinsed at room temperature in 0 . 1 x ssc and 
0.1%SDS for 5 minutes, cover slipped and scanned. 
^5 A custom built laser fluorescent scanner was used 

to detect the two-color hybridization signals from the 
1.8 X 1,8 cm array at 20 micron resolution. The 
scanned image was gridded and analyzed using custom 
image analysis software. After correcting for optical 
20 crosstalk between the f luorophores due to their 
overlapping emission spectra, the red and green 
hybridization values for each clone on the array were 
correlated to the known physical map position of the 
clone resulting in a computer-generated color karyotype 
25 of the yeast genome. 

Figure 6 shows the hybridization pattern of the 
two chromosome pools. A red signal indicates that the 
lambda clone on the array surface contains a cloned 
genomic DNA segment from one of the largest six yeast 
30 chromosomes. A green signal indicates that the lambda 
clone insert comes from one of the smallest ten yeast 
chromosomes. Orange signals indicate repetitive 
sequences which cross hybridized to both chromosome 
pools. Control spots on the array confirm that the 
35 hybridization is specific and reproducible. 
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The physical map locations of the genomic DNA 
fragments contained in each of the clones used as array 
elements have been previously determined by Olson and 
co-workers (Riles, et al. ) allowing for the automatic 
5 generation of the color karyotype shown in Figure 7. 
The color of a chromosomal section on the karyotype 
corresponds to the color of the array element 
containing the clone from that section. The black 
regions of the karyotype represent false negative dark 
10 spots on the array (10%) or regions of the genome not 
covered by the Olson clone library (90%) . Note that 
the largest six chromosomes are mainly red while the 
smallest ten chromosomes are mainly green matching the 
original CHEF gel isolation of the hybridization probe. 
15 Areas of the red chromosomes containing green spots and 
vice-versa are probably due to spurious szunple tracking 
errors in the formation of the original libraxy and in 
the cunplification and spotting procedures. 

The yeast genome currays have also been probed with 
20 individual clones or pools of clones that are 

fluorescently labeled for physical mapping pxirposes. 
The hybridization signals of these clones to the array 
were translated into a position on the physical map of 
yeast. 

25 

Example 2 

Total cDNA Hvbridized to Micro Arrays of 
cDNA Clones with Two-Color 
Fluorescent Detection 

30 24 clones containing cDNA inserts from the plant 

Arabidopsis were amplified using PCR. Salt was added 
to the purified PCR products to a final concentration 
of 3 X SSC. The cDNA clones were spotted on poly-1- 
lysine coated microscope slides in a manner similar to 

35 Example l. Among the cDNA clones was a clone 
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25 



30 



representing a transcription factor HAT 4 , which had 
previously been used to create a transgenic line of the 
plant Arabidopsis, in which this gene is present at ten 
times the level found in wild-type Arabidopsis (Schena,- 
5 et al., 1992). 

Total poly-A mRNA from wild type Araijidopsis was 
isolated using standard methods (Maniatis, et al., 
1989) and reverse transcribed into total cDNA, using 
fluorescein nucleotide analog to label the cDNA product 
10 (green fluorescence) . A similar procedure was 

performed with the transgenic line of Arabidopsis where 
the transcription factor HAT4 was inserted into the 
genome using standard gene transfer protocols. cDNA 
copies of mRNA from the transgenic plant are labeled 
with a lissamine nucleotide analog (red fluorescence) . 
Two micrograms of the cDNA products from each type of 
plant were pooled together and hybridized to the cDNA 
clone array in a 10 microliter hybridization reaction 
in a manner similar to Example l. Rinsing and 
detection of hybridization was also performed in a 
manner similar to Example i. Pig. 8 show the resulting 
hybridization pattern of the array. 

Genes equally expressed in wild type and the 
transgenic Arabidopsis appeared yellow due to equal 
contributions of the green and red fluorescence to the 
final signal. The dots are different intensities of 
yellow indicating various levels of gene expression. 
The CDNA clone representing the transcription factor 
HAT4, expressed in the transgenic line of Arabidopsis 
but not detectably expressed in wild type Arabidopsis, 
appears as a red dot (with the arrow pointing to it) , 
indicating the preferential expression of the 
transcription factor in the red-labeled transgenic 
Arabidopsis and the relative lack of expression of the 
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transcription factor in the green-labeled wild type 
Arabidopsis . 

An advantage of the microarray hybridization 
format for gene expression studies is the high partial 
5 concentration of each cDNA species achievable in the 10 
microliter hybridization reaction. This high partial 
concentration allows for detection of rare transcripts 
without the need for PGR amplification of the 
hybridization probe which may bias the true genetic 

10 representation of each discrete cDNA species. 

Gene expression studies such as these can be used 
for genomics research to discover which genes are 
expressed in which cell types, disease states, 
development states or environmental conditions. Gene 

15 expression studies Cein also be used for diagnosis of 
disease by empirically correlating gene expression 
patterns to disease states. 

Example 3 

2° Multiplexed Color imetric Hybridization on 

a Gridded Solid Support 

A sheet of plastic-backed nitrocellulose was 

gridded with barrier elements made from silicone rubber 

according to the description in Section IV-A. The 

25 sheet was soaked in 10 x SSC and allowed to dry. As 

shown in Fig. 12, 192 M13 clones each with a different 
yeast inserts were arrayed 400 microns apart in four 
quadrants of the solid support using the automated 
device described in Section III. The bottom left 

30 quadrant served as a negative control for hybridization 
while each of the other three quadrants was hybridized 
simultaneously with a different oligonucleotide using 
the open-face hybridization technology described in 
Section IV-A. The first two and last four elements of 
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each array are positive controls for the colorimetric 
detection step. 

The oligonucleotides were labeled with fluorescein 
which was detected using an anti-f luorescein antibody 
conjugated to alkaline phosphatase that precipitated an 
NBT/BCIP dye on the solid support (Amersham) . Perfect 
matches between the labeled oligos and the M13 clones 
resulted in dark spots visible to the naked eye and 
detected using an optical scanner (HP ScanJet II) 
attached to a personal computer. The hybridization 
patterns are different in every quadrant indicating 
that each oligo found several unique M13 clones from 
among the 192 with a perfect sequence match. Note that 
the open capillary printing tip leaves detectable 
dimples on the nitrocellulose which can be used to 
automatically align and analyze the images. 

Although the invention has been described with 
respect to specific embodiments and methods, it will be 
clear that various changes and modification may be made 
without departing from the invention. 
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IT IS CLAIMED: 

1. A method of forming a microarray of analyte- 
assay regions on a solid support, where each region in 
the array has a known amount of a selected, analyte- 
specific reagent, said method comprising, • 

(a) loading a solution of a selected analyte- 
specific reagent in a reagent-dispensing device having 
an elongate capillary channel (i) formed by spaced- 
apart, coextensive elongate members, (ii) adapted to 
hold a quantity of the reagent solution and (iii) 
having a tip region at which aqueous solution in the 
channel forms a meniscus, 

(b) tapping the tip of the dispensing device 
against a solid support at a defined position on the 
surface, with an impulse effective to break the 
meniscus in the capillary channel and deposit a 
selected volume of solution on the surface, and 

(c) repeating steps (a) and (b) until said array 
20 is formed. 

2. The method of claim i, wherein said tapping is 
carried out with an impulse effective to deposit a 
selected volume in the volume range between O.oi to 100 
25 nl. 



3. The method of claim 1, wherein said channel is 
formed by a pair of spaced-apart tapered elements. 

^° 4- The method of claim 1, for forming a plurality 

of such arrays, wherein step (b) is applied to a 
selected position on each of a plurality of solid 
supports at each repeat cycle proceeding step (c) . 



wo 95/35505 



PCT/US95/07659 



40 

5. The method of claim 1, which further includes, 
after performing steps (a) and (b) at least one time, 
reloading the reagent-dispensing device with a new 
reagent solution by the steps of (i) dipping the 
5 capillary channel of the device in a wash solution, 
(ii) removing wash solution drawn into the capillary 
channel, and (iii) dipping the capillary channel into 
the new reagent solution. 

10 6. Automated apparatus for fomaing a microarray 

of analyte*assay regions on a plurality of solid 
supports, where each region in the array has a known 
amount of a selected, analyte-specif ic reagent, said 
apparatus comprising 

15 (a) a holder for holding, at known positions, a 

plurality of planar supports, 

(b) a reagent dispensing device having ah open 
capillary chemnel (i) formed by spaced-apart, 
coextensive elongate members (ii) adapted to hold a 

20 quantity of the reagent solution and (iii) having a tip 
region at which aqueous solution in the channel forms a 
meniscus, 

(c) positioning means for positioning the 
dispensing device at a selected array position with 

25 respect to a support in said holder, 

(d) dispensing means for moving the device. into 
tapping engagement against a support with a selected 
impulse, when the device is positioned at a defined 
array position with respect to that support, with an 

30 impulse effective to break the meniscus of liquid in 

the capillary channel and deposit a selected volume of 
solution on the surface, and 

(e) control means for controlling said positioning 
and dispensing means. 
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7. The apparatus of claim 6, wherein said 
dispensing means is effective to move said dispensing 
device against a support with an impulse effective to 
deposit a selected volume in the volume range between 

5 0.01 to 100 nl. 

8. The apparatus of claim 6, wherein said channel 
is formed by a pair of spaced-apart tapered elements. 

10 9. The apparatus of claim 6, wherein the control 

means operates to (i) place the dispensing device at a 
loading station, (ii) move the capillary channel in the 
device into a selected reagent at the loading station, 
to load the dispensing device with the reagent, and 
(iii) dispense the reagent at a defined array position 
on each of the supports on said holder. 



15 



20 



10. The apparatus of claim 6, wherein the control 
device further operates, at the end of a dispensing 
cycle, to wash the dispensing device by (i) placing the 
dispensing device at a washing station, (ii) moving the 
capillary channel in the device into a wash fluid, to 
load the dispensing device with the fluid, and (iii) 
remove the wash fluid prior to loading the dispensing 

25 device with a fresh selected reagent. 

11. The apparatus of claim 6, wherein said device 
is one of a plurality of such devices which are carried 
on the arm for dispensing different analyte assay 

30 reagents at selected spaced array positions. 

12. A substrate with a surface having a 
microarray of at least lo' distinct polynucleotide or 
polypeptide biopolymers per l cm' surface area, each 
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distinct biopolymer sample (i) being disposed at a 
separate, defined position in said array, (ii) having a 
length of at least 50 subunits, and (iii) being present 
in a defined amount between about 0.1 femtomole and 100 
5 nanomoles. 

13. The substrate of claim 12, wherein said 
surface is glass slide coated with polylysine, and said 
biopolymers are polynucleotides. 

10 

14. The substrate of claim 12, wherein said 
substrate has a water- impermeable backing, a water- 
permeable film formed on the backing, and a grid formed 
on the film, where said grid (i) is composed of 

15 intersecting water- impervious grid elements extending 

from said backing to positions raised above the surface 
of said film, and (ii) partitions the film into a 
plurality of water-impervious cells, where each cell 
contains such a biopolymer array. 

20 

15. A substrate with a surface array of sample- 
receiving cells, comprising 

a water-impermeable backing, 

a water-permeable film formed on the backing, and 
25 a grid formed on the film, said grid being composed of 
intersecting water-impervious grid elements extending 
from said backing to positions raised above the surface 
of said film. 

30 16. The substrate of claim 15, wherein the cells 

of the array each contain an array of biopolymers. 

17. A substrate for use in detecting binding of 
labeled biopolymers to one or more of a plurality 
35 distinct polynucleotides, comprising 
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a non-porous, glass substrate, 

a coating of a cat ionic polymer on said substrate, 

and 

an array of distinct polynucleotides to said 
coating, where each biopolyroer is disposed at a 
separate, defined position in a surface array of 
biopolymers. 

18. A method of detecting differential expression 
of each of a plurality of genes in a first cell type 
with respect to expression of the same genes in a 
second cell types, said method comprising 

producing fluorescence-labeled cDNA's from mRNA's 
isolated from the two cells types, where the cDNA's 
15 from the first and second cells are labeled with first 
and second different fluorescent reporters, 

adding a mixture of the labeled cDNA's from the 
two cell types to an array of polynucleotides 
representing a plurality of known genes derived from 
the two cell types, under conditions that result in 
hybridization of the cDNA's to complementary-sequence 
polynucleotides in the array; and 

examining the array by fluorescence under 
fluorescence excitation conditions in which (i) 
25 polynucleotides in the array that are hybridized 

predominantly to cDNA's derived from one of the first 
and second cell types give a distinct first or second 
fluorescence emission color, respectively, and (ii) 
polynucleotides in the array that are hybridized to 
30 substantially equal nxuabers of cDNA's derived from the 
first and second cell types give a distinct combined 
fluorescence emission color, respectively, 

wherein the relative expression of known genes in 
the two cell types can be determined by the observed 
35 fluorescence emission color of each spot. 



20 
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19. The method of claim 18, wherein the array of 
polynucleotides is formed on a substrate with a surface 
having an array of at least 10^ distinct polynucleotide 
or polypeptide biopolymers in a surface area of less 
than about 1 cm^, each distinct biopolymer (i) being 
disposed at a separate, defined position in said array, 
(ii) having a length of at least 50 sxibunits, and (iii) 
being present in a defined amount between about .1 
femtomole and 100 nmoles. 

20. The method of claim 19, wherein said surface 
is a glass slide coated with poly lysine, and said 
biopolymers are polynucleotides non-covalently bound to 
said poly lysine. 
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Fig, 2C 
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ABSTRACT cDNA microarray technology is used to profile 
complex diseases and discover novel disease-related genes. In 
inflammatory disease such as rheumatoid arthritis, expression 
patterns of diverse cell types , contribute to the pathology. We 
have monitored gene expression in this disease state with a 
microarray of selected human genes of probable significance in 
inflammation as well as with genes expressed in peripheral 
human blood cells. Messenger RNA from cultured macrophages, 
chondrocyte cell lines, primary chondrocytes, and synoviocytes 
provided expression profiles for the selected cytokines, chemo- 
kines, DNA binding proteins, and matrix-degrading metal- 
loproteinases. Comparisons between tissue samples of rheuma- 
toid arthritis and inflammatory bowel disease verified the in- 
volvement of many genes and revealed novel participation of the 
cytokine interleukin 3, chemokine Groa and the metal- 
loproteinase matrix metallo-elastase in both diseases. From the 
peripheral blood library, tissue inhibitor of metalloproteinase 1, 
ferritin light chain, and manganese superoxide dismutase genes 
were identified as expressed differentially in rheumatoid arthri- 
;tis compared with inflammatory bowel disease. These results 
successfully demonstrate the use of the cDNA microarray system 
as a general approach for dissecting human diseases. 



The recently described cDNA microarray or DNA-chip tech- 
nology allows expression monitoring of hundreds and thou- 
sands of genes simultaneously and provides a format for 
identifying genes as well as changes in their activity (1, 2). 
Using this technology, two-color fluorescence patterns of 
differential gene expression in the root versus the shoot tissue 
of Arabidopsis were obtained in a specific array of 48 genes (1). 
In another study using a 1000 gene array from a human 
peripheral blood library, novel genes expressed by T cells were 
identified upon heat shock and protein kinase C activation (3). 

The technology uses cDNA sequences or cDNA inserts of a 
library for PCR amplification that are arrayed on a glass slide with 
high speed robotics at a density of 1000 cDNA sequences per cm^. 
These microarrays serve as gene targets for hybridization to 
cDNA probes prepared from RNA samples of cells or tissues. A 
two-color fluorescence labeling technique is used in the prepa- 
ration of the cDNA probes such that a simultaneous hybridization 
but separate detection of signals provides the comparative anal- 
ysis and the relative abundance of specific genes expressed (1, 2). 
Microarrays can be constructed from specific cDNA clones of 
interest, a cDNA library, or a select number of open reading 
frames from a genome sequencing database to allow a large-scale 
functional analysis of expressed sequences. 
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Because of the wide spectrum of genes and endogenous 
mediators involved, the microarray technology is well suited 
for analyzing chronic diseases. In rheumatoid arthritis (RA), 
inflammation of the joint is caused by the gene products of 
many different cell types present in the synovium and cartilage 
tissues plus those infiltrating from the circulating blood. The 
autoimmune and inflammatory nature of the disease is a 
cumulative result of genetic susceptibility factors and multiple 
responses, paracrine and autocrine in nature, from macro- 
phages, T cells, plasma cells, neutrophils, synovial fibroblasts, 
chondrocytes, etc. Growth factors, inflammatory cytokines 
(4), and the chemokines (5) are the important mediators of this 
inflammatory process. The ensuing destruction of the cartilage 
and bone by the invading synovial tissue includes the actions 
of prostaglandins and leukptrienes (6), and the matrix degrad- 
ing metalloproteinases (MMPs). The MMPs are an important 
class of Zn-dependent metallo-endopfoteinases that can col- 
lectively degrade the proteoglycan and collagen components of 
the connective tissue matrix (7). 

This paper presents a study in which the involvement of 
select classes of molecules in R A was. examined. Also inves- 
tigated were 1000 human genes randomly selected from a 
peripheral human blood cell library! Their differential and 
quantitative expression analysis in cells of the joint tissue, in 
diseased RA tissue and in inflammatory bowel disease (IBD) 
tissues was conducted to demonstrate* the utility of the mi- 
croarray method to analyze complex diseases by their pattern 
of gene expression. Such a survey provides insight not only into 
the underlying cause of the pathology, but also provides the 
opportunity to selectively target genes for disease intervention 
by appropriate drug development and gene therapies. 

METHODS 

Microarray Design, Development, and Preparation. Two ap- 
proaches for the fabrication of cDNA microarrays were used in 
this study. In the first approach, known human genes of probable 
significance in RA were identified. Regions of the clones, pref- 
erably 1 kb in length, were selected by their proximity to the 3' end 
of the cDNA and for areas of least identity to related and 
repetitive sequences. Primers were synthesized to amplify the 
target regions by standard PCR protocols (3). Products were 
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verified by gel electrophoresis and purified with Qiaquick 96-well 
purification kit (Qiagen, Chatsworth, CA), lyophilized (Savant), 
and resuspended in 5 fi\ of 3X standard saline citrate (SSC) buffer 
for arraying. In the second approach, the microarray containing 
the 1056 human genes from the peripheral blood lymphocyte 
library was prepared as described (3). 

Tissue Specimens. Rheumatoid synovial tissue was obtained 
from patients with late stage classic RA undergoing remedial 
synovectomy or arthroplasty of the knee. Synovial tissue was 
separated from any associated connective tissue or fat. One 
gram of each synovial specimen was subjected to RNA extrac- 
tion within 40 min of surgical excision, or explants were 
cultured in serum-free medium to examine any changes under 
in vitro conditions. For IBD, specimens of macroscopically 
inflamed lower intestinal mucosa were obtained from patients 
with Crohn disease undergoing remedial surgery. The hyper- 
trophied mucosal tissue was separated from underlying con- 
nective tissue and extracted for RNA. 

Cultured Cells, The Mono Mac-6 (MM6) monocytic cells 
(8) were grown in RPMI medium. Human chondrosarcoma 
SW1353 cells, primary human chondrocytes, and synoviocytes 
(9, 10) were cultured in DMEM; all culture media were 
supplemented with 10% fetal bovine serum, 100 /itg/ml strep- 
tomycin, and 500 units/ml penicillin. Treatment of cells with 
lipopolysaccharide (LPS) endotoxin at 30 ng/ml, phorbol 
12-myristate 13-acetate' (PMA) at 50 ng/ml, tumor necrosis 
factor a (TNF-a) at 50 ng/ml, interleukin (IL)-l/3 at 30 ng/ml, 
or transforming growth factor-/3 (TGF-)3) at 100 ng/ml is 
described in the figure legends. 



Fluorescent Probe, Hybridization, and Scanning. Isolation, of 
mRNA, probe preparation, and quantitation with Arabidopsis 
control mRNAs was essentially as described (3) except for the 
following minor modification. Following the reverse transcriptase 
step, the appropriate Cy3- and Cy5-labeled samples were pooled; 
mRNA degraded by heating the sample to 65°C for 10 min with 
the addition of 5 ptl of 0.5M NaOH plus 0.5 ml of 10 mM EDTA. 
The pooled cDNA was purified from unincorporated nucleotides 
by gel filtration in Centri-spin columns (Princeton Separations, 
Adelphia, NJ). Samples were lyophilized and dissolved in 6 ^1 of 
hybridization buffer (5X SSC plus 0.2% SDS). Hybridizations, 
washes, scanning, quantitation procedures, and pseudocolor rep- 
resentations of fluorescent images have been described (3). Scans 
for the two fluorescent probes were normalized either to the 
fluorescence intensity of Arabidopsis mRNAs spiked into the 
labeling reactions (see Figs. 2-4) or to the signal intensity of 
|3-actin and glyceraldehyde-3-phosphate dehydrogenase 
(GAPDH; see Fig. 5). . . 

RESULTS 

Ninety-Six-Gene Microarray Design. The actions of cytokines, 
growth factors, chemokines, transcription factors, MNfl*s, pros- 
taglandins, and leukotrienes are well recognized in inflammatory 
disease, particularly RA (11-14). Fig. 1 displays the selected genes 
for this study and also includes control cDNAs of housekeeping 
genes such as jS-actin and GAPDH and genes from Arabidopsis 
for signal normalization and quantitation (row A, columns 1-12). 

Defining Microarray Assay Conditions. Different lengths and 
concentrations of target DNA were tested by arraying PCR- 




FiG. 1. Ninety-six-element microarray design. The target element name and the corresponding gene are shown in the layout. Some genes have 
more than one target element to guarantee specificity of signal. For TNF the targets represent decreasing lengths of 1, 0.8, 0.6, 0.4, and 0.2 kb from 
left to right. 
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amplified products ranging from 0.2 to 1.2 kb at concentrations 
of 1 /Ltg//Al or less. No significant difference in the signal levels was 
observed within this range of target size and only with 0.2-kb 
length was a signal reduced upon an 8-fold dilution of the 1 p-g//il 
sample (data not shown). In this study the average length of the 
targets was 1 kb, with a few exceptions in the range of «*300 bp, 
arrayed at a concentration of 1 ^g/^1. Normally one PCR pro- 
vided sufficient material to fabricate up to 1000 microarray targets. 

In considering positional effects in the development of the 
targets for the microarrays, selection was biased toward the 3' 
proximal regions, because the signal was reduced if the target 
fragment was biased toward the 5' end (data not shown). This 
result was anticipated since the hybridizing probe is prepared by 
reverse transcription with oligo(dT)-primed mRNA and is richer 
in 3' proximal sequences. Cross-hybridizations of probes to 
targets of a gene family were analyzed with the matrix metal- 



loproteinases as the example because they can show regions of 
sequence identities of greater than 70%. With collagenase-1 
(Col-1) and collagenase-2 (Col-2) genes as targets with up to 70% 
sequence identity, and stromelysin-1 (Strom- 1) and stromelysin-2 
(Strom-2) genes with different degrees of identity, our results 
showed that a short region of overlap, even with 70-90% se- 
quence identity, produced a low level of cross-hybridization. 
However, shorter regions of identity spread over the length of the 
target resulted in cross-hybridization (data not shown). For 
closely related genes, targets were designed by avoiding long 
stretches of homology. For members of a gene family two or more 
target regions were included to discriminate between specificity 
of signal versus cross-hybridization. 

Monitoring Differential Expression in Cultured Cell Lines. In 
RA tissue, the monocyte/macrophage population plays a prom- 
inent role in phagocytic and immunomodulatory activities. Typ- 
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Fig. 2. Time course for LPS/PMA-induced MM6 cells. Array elements are described in Fig. \.{A) Pseudocolor representations of fluorescent 
scans correspond to gene expression levels at each time point. The array is made up of SArabidopsis control targets and 86 human cDNA targets, 
the majority of which are genes with known or suspected involvement in inflammation. The color bars provide a comparative calibration scale 
between arrays and are derived from the Arabidopsis mRNA samples that are introduced in equal amounts during probe preparation. Fluorescent 
probes were made by labeling mRNA from untreated MM6 cells or LPS and PMA treated cells. mRNA was isolated at indicated times after 
induction. (B J-IIf) The two-color samples were cohybridized, and microarray scans provided the data for the levels of select transcripts at different 
time points relative to abundance at time zero. The analysis was performed using normalized data collected from 8-bit images. 



Biochemistry: Heller et al. 



Proc, Natl Acad, Sci. USA 94 (1997) 2153 



ically these cells, when triggered by an immunogen, produce the 
proinflammatroy cytokines TNF and 11^1. We have used the 
monocyte cell line MM6 and monitored changes in gene expres- 
sion upon activation with LPS endotoxin, a component of Gram- 
negative bacterial membranes, and PMA, which augments the 
action of LPS on TNF production (15). RNA was isolated at 
different times after induction and used for cDNA probe prep- 
aration. From this time course it was clear that TNF expression 
was induced within 15 min of treatment, reached maximum levels 
in 1 hr, remained high until 4 hr and subsequently declined (Fig. 
lA), Many other cytokine genes were also transiently activated, 
such as IL-la and -)3, IL-6, and granulocyte colony-stimulating 
factor (GCSF). Prominent chemokines activated were IL-8, mac- 
rophage inflammatory protein (MIP)-l/3, more so than MlP-la, 
and Groa or melanoma growth stimulatory factor. Migration 
inhibitory factor (MIF) expressed in the uninduced state declined 
in LPS-activated cells. Of the immediate early genes, the notice- 
able ones were c-fos,Jra-l, c-jun, NF-KBp50, and IkB, with c-rel 
expression observed even in the uninduced state (Fig. IB). These 
expression patterns are consistent with reported patterns of 
activation of certain LPS- and PMA-induced genes (12). Dem- 
onstrated here is the unique ability of this system to allow parallel 
visualization of a large number of gene activities over a period of 
time. 

SW1353 cells is a line derived from malignant tumors of the 
cartilage and behaves much like the chondrocytes upon stim- 
ulation with TNF and IL-1 in the expression of MMPs (9). In 
addition to confirming our earlier observations with Northern 
blots on Strom-1, Col-1, and Col-3 expression (9), gelatinase 
(Gel) A, putative metalloproteinase (PUMP)-l membrane- 
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type matrix metalloproteinase, tissue inhibitors of matrix 
metalloproteinases or tissue inhibitor of metalloproteinase 1 
(TIMP-1), -2, and -3 were also expressed by these cells together 
with the human matrix metallo-elastase (HME; Fig. HME 
induction was estimated to be ««50-fold and was greater than 
any of the other MMPs examined (Fig. 35). This result was 
unexpected because HME is reportedly expressed only by 
alveolar macrophage and placental cells (16). Expression of 
the cytokines and chemokines, IL-6, IL-8, MIF, and MIP-1)3 
was also noted. A variety of other genes, including certain 
transcription factors, were also up-regulated (Fig. 3), but the 
overall time-dependent expression of genes in the SW1353 
cells was qualitatively distinct from the MM6 cells. 

Quantitation of differential gene expression (Figs. IB and 
35) was achieved with the simultaneous hybridization of 
Cy3-labeled cDNA from untreated cells and Cy5-labeled 
cDNA from treated samples. The estimated increases in 
expression from these microarrays for a select number of genes 
including IL-1)3, IL-8, MIP-1)3, TNF, HME, Col-1, Col-3, 
Strom-1, and Strom-2 were compared with data collected from 
dot blot analysis. Results (not shown) were in close agreement 
and confirmed our earlier observations on the use of the 
microarray method for the quantitation of gene expression (3). 

Expression Profiles in Primary Chondrocytes and Synovio- 
cytes of Human RA Tissue. Given the sensitivity and the 
specificity of this method, expression profiles of primary 
synoviocytes and chondrocytes from diseased tissue were 
examined. Without prior exposure to inducing agents, low level 
expression of c-jun, GCSF, IL-3, TNF-j3, MIF, and RANTES 
(regulated upon activation, normal T cell expressed and se- 
creted) was seen as well as expression of MMPs, GelA, 
Strom-1, Col-1, and the three TIMPs. In this case, Col-2 
hybridization was considered to be nonspecific because the 
second Col-2 target taken from the 3' end of the gene gave no 

A. Human synovial fibroblasts B, Human articular chondrocytes 
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Fig. 3. Time course for and TNF-induced SW1353 cells 

using the inflammation array (Fig. 1). (A) Pseudocolor representation 
of fluorescent scans correspond to gene expression levels at each time 
point. {B I-IV) Relative levels of selected genes at different time points 
compared with time zero. 
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Fig. 4. Expression profiles for early passage primary synoviocytes and 
chondrocytes isolated from RA tissue, cultured in the presence of 10% 
feta! calf serum and activated with PMA and IL-1/3, or TNF and IL-1/3, 
or TGF-/3 for 18 hr. The color bars provide a comparative calibration scale 
between arrays and are derived from Arabidopsis mRNA samples that 
are introduced in equal amounts during probe preparation 
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signal. Treatment more so with PMA and IL-1, than TNF and 
IL-1, produced a dramatic up-regulation in expression of 
several genes in both of these primary cell types. These genes 
are as follows: the cytokine IL-6, the chemokines IL-8 and 
Gro-la, and the MMPs; Strom-1, Col-1, Col-3, and HME; and 
the adhesion molecule, vascular cell adhesion molecule 1 
(VCAM-1). The surprise again is HME expression in these 
primary cells, for reasons discussed above. From these results, 
the expression profiles of synoviocytes and the chondrocytes 
appear very similar; the differences are more quantitative than 
qualitative. Treatment of the primary chondrocytes with the 
anabolic growth factor TGF-j3 had an interesting profile in that 
it produced a remarkable down-regulation of genes expressed 
in both the untreated and induced state (Fig. 4). 

Given the demonstrated effectiveness of this technology, a 
comparative analysis of two different inflammatory disease 
states was conducted with probes made from RA tissue and 
IBD samples. RA samples were from late stage rheumatoid 
synovial tissue, and IBD specimens were obtained from in- 
flamed lower intestinal mucosa of patients with Crohn disease. 
With both the 96-element known gene microarray and the 
1000-gene microarray of cDNAs selected from a peripheral 
human blood cell library (3), distinct differences in gene 
expression patterns were evident. On the 96-gene array, RA 
tissue samples from different affected individuals gave similar 
profiles (data not shown) as did different samples from the 
same individual (Fig. 5). These patterns were notably similar 
to those observed with primary synoviocytes and chondrocytes 
(Fig. 4). Included in the list of prominently up-regulated genes 
are IL-6, the MMPs Strom-1, Col-1, GelA, HME, and in 
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Fig. 5. Expression profiles of RA tissue (A) and IBD tissue (5). 
mRNA from R A tissue samples obtained from the same individual was 
isolated directly after excision (RA 21. 5 A) or maintained in culture 
without serum for 2 hr (RA 21.5B) or for 6 hr (RA 21.5C). Profiles 
from tissue samples of two other individuals (data not shown) were 
remarkably similar to the ones shown here. IBD-A and IBD-CI are 
from mRNA samples prepared directly after surgery from two sepa- 
rate individuals. For the IBD-CII probe, the tissue sample was cultured 
in medium without serum for 2 hr before mRNA preparation. 



certain samples PUMP, TIMPs, particularly TIMP-1 and 
TIMP-3, and the adhesion molecule VCAM. Discernible levels 
of macrophage chemotactic protein 1 (MCP-1), MIF and 
RANTES were also noted. IBD samples were in comparison, 
rather subdued although IL-1 converting enzyme (ICE), 
TIMP-1, and MIF were notable in all the three different IBD 
samples examined here. In IBD-A, one of three individual 
samples, ICE, VCAM, Groa, and MMP expression was more 
pronounced than in the others. 

We also made use of a peripheral blood cDNA library (3) 
to identify genes expressed by lymphocytes infiltrating the 
inflamed tissues from the circulating blood. With the 1046- 
element array of randomly selected cDNAs from this library, 
probes made from RA and IBD samples showed hybridizations 
to a large number of genes. Of these, many were common 
between the two disease tissues while others were differentially 
expressed (data not shown). A complete survey of these genes 
was beyond the scope of this study, but for this report we 
picked three genes that were up-regulated in the RA tissue 
relative to IBD. These cDNAs were sequenced and identified 
by comparison to the GenBank database. They are TIMP-1, 
apoferritin light chain, and manganese superoxide dismutase 
(MnSOD). Differential expression of MnSOD was only ob- 
served in samples of RA tissue explants maintained in growth 
medium without serum for anywhere between 2 to 16 hr. These 
results also indicate that the expression profile of genes can be 
altered when explants are transferred to culture conditions. 

DISCUSSION 

The speed, ease, and feasibility of simultaneously monitoring 
differential expression of hundreds of genes with the cDNA 
microarray based system (1-3) is demonstrated here in the 
analysis of a complex disease such as RA. Many different cell 
types in the RA tissue; macrophages, lymphocytes, plasma cells, 
neutrophils, synoviocytes, chondrocytes, etc. are known to con- 
tribute to the development of the disease with the expression of 
gene products known to be proinflammatory. They include the 
cytokines, chemokines, growth factors, MMPs, eicosanoids, and 
others (7, 11-14), and the design of the 96-eIement known gene 
microarray was based on this knowledge and depended on the 
availability of the genes. The technology was validated by con- 
firming earlier observations on the expression of TNF by the 
monocyte cell line MM6, and of Col-1 and Col-3 expression in the 
chondrosarcoma cells and articular chondrocytes (9, 12). In our 
time-dependent survey the chronological order of gene activities 
in and between gene families was compared and the results have 
provided unprecedented profiles of the cytokines (TNF,' IL-1, 
IL-6, GCSF, and MIF), chemokines (MlP-la, MlP-lft IL-8, and 
Gro-1), certain transcription factors, and the matrix metal- 
loproteinases (GeLA, Strom-1, Col-1, Col-3, HME) in the mac- 
rophage cell line MM6 and in the SW1353 chondrosarcoma cells. 

Earlier reports of cytokine production in the diseased state had 
established a model in which TNF is a major participant in RA. 
Its expression reportedly preceded that of the other cytokines and 
effector molecules (4). Our results strongly support these results 
as demonstrated in the time course of the MM6 cells where TNF 
induction preceded that of IL-lot and IL-P followed by IL-6 and 
GCSF. These expression profiles demonstrate the utility of the 
microarrays in determining the hierarachy of signaling events. 

In the SW1353 chondrosarcoma cells, all the known MMPs and 
TIMPs were examined simultaneously. HME expression was 
discovered, which previously had been observed in only the 
stromal cells and alveolar macrophages of smoker*s lungs and in 
placental tissue. Its presence in cells of the RA tissue is mean- 
ingful because its activity can cause significant destruction of 
elastin and basement membrane components (16, 17). Expression 
profiles of synovial fibroblasts and articular chondrocytes were 
remarkably similar and not too different from the SW1353 cells, 
indicating that the fibroblast and the chondrocyte can play equally 
aggressive roles in joint erosion. Prominent genes expressed were 
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the MMPs, but chemokines and cytokines were also produced by 
these cells. The effect of the anabolic growth factor TGF-j3 was 
profoundly evident in demonstrating the down regulation of these 
catabolic activities. 

RA tissue samples undeniably reflected profiles similar to 
the cell types examined. Active genes observed were IL-3, IL-6, 
ICE, the MMPs including HME and TIMPs, chemokines IL-8, 
Groa, MIP, MIF, and RANTES, and the adhesion molecule 
VCAM. Of the growth factors, fibroblast growth factor )3 was 
observed most frequently. In comparison, the expression 
patterns in the other inflammatory state (i.e., IBD) were not 
as marked as in the RA samples, at least as obtained from the 
tissue samples selected for this study. 

As an alternative approach, the 1046 cDNA microarray of 
randomly selected genes from a lymphocyte library was used to 
identify genes expressed in RA tissue (3). Many genes on this 
array hybridized with probes made from both R A and IBD tissue 
samples. The results are not surprising because inflammatory 
tissue is abundantly supplied with cell types infiltrating from the 
circulating blood, made apparent also by the high levels of 
chemokine expression in R A tissue. Because of the magnitude of 
the effort required to identify all the hybridized genes, we have for 
this report chosen to describe only three differentially expressed 
genes mainly to verify this method of analysis. 

Of the large number of genes observed here, a fair number 
were already known as active participants in inflammatory dis- 
ease. These are TNF, IL-1, 11^6, IL-8, GCSF, RANTES, and 
VCAM. The novel participants not previously reported are 
HME, IL-3, ICE, and Groa. With our discovery of HME 
expression in RA, this gene becomes a target for drug interven- 
tion. ICE is a cysteine protease well known for its IL-ip process- 
ing activity (18), and recognized for its role in apoptotic cell death 
(19). Its expression in RA tissue is intriguing. IL-3 is recognized 
for its growth-promoting activity in hematopoietic cell lineages, is 
a product of activated T cells (20), and its expression in synovio- 
cytes and chondrocytes of R A tissue is a novel observation. 

Like IL-8, Groa, is a C-X-C subgroup chemokine and is a 
potent neutrophil and basophil chemoattractant. It down- 
regulates the expression of types I and III interstitial collagens 
(21, 22) and is seen here produced by the MM6 cells, in primary 
synoviocytes, and in R A tissue. With the presence of RANTES, 
MCP, and MlP-ljS, the C-C chemokines (23) migration and 
infiltration of monocytes, particularly T cells, into the tissue is 
also enhanced (5) and aid in the trafficking and recruitment of 
leukocytes into the RA tissue. Their activation, phagocytosis, 
degranulation, and respiratory bursts could be responsible for 
the induction of MnSOD in RA. MnSOD is also induced by 
TNF and IL-1 and serves a protective function against oxida- 
tive damage. The induction of the ferritin light chain encoding 
gene in this tissue may be for reasons similar to those for 
MnSOD. Ferritin is the major intracellular iron storage protein 
and it is responsive to intracellular oxidative stress and reactive 
oxygen intermediates generated during inflammation (24, 25). 
The active expression of TIMP-1 in RA tissue, as detected by 
the 1000-element array, is no surprise because our results have 
repeatedly shown TIMP-1 to be expressed in the constitutive 
and induced states of RA cells and tissues. 

The suitability of the cDNA microarray technology for 
profiling diseases and for identifying disease related genes is 
well documented here. This technology could provide new 



targets for drug development and disease therapies, and in 
doing so allow for improved treatment of chronic diseases that 
are challenging because of their complexity. 
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MEASUREMENT OF GENE EXPRESSION PROFILES 
IN TOXICITY DETERMINATION 

5 Field of the Invention 

The invention relates generally to methods for detecting and monitoring 
phenotypic changes in in vitro and in vivo systems for assessing and/or determining 
the toxicity of chemical compounds, and more particularly, the invention relates to a 
method for detecting and monitoring changes in gene expression patterns in in vitro 
1 0 and in vivo systems for determining the toxicity of drug candidates. 

BACKGROUND 

The ability to rapidly and conveniently assess the toxicity of new compounds 
is extremely important. Thousands of new compounds are synthesized every year, 

1 5 and many are introduced to the environment through the development of new 

commercial products and processes, often with little knowledge of their short term 
and long term health effects. In the development of new drugs, the cost of assessing 
the safety and efficacy of candidate compounds is becoming astronomical: It is 
, estimated that the pharmaceutical industry spends an average of about 300 million 

20 dollars to bring a new pharmaceutical compound to market, e.g. Biotechnology. 13: 
226-228 (1995). A large fraction of these costs are due to the failure of candidate 
compounds in the later stages of the developmental process. That is, as the 
assessment of a candidate drug progresses from the identification of a compound as a 
drug candidate-for example, through relatively inexpensive binding assays or in vitro 

25 screening assays, to pharmacokinetic studies, to toxicity studies, to efficacy studies in 
model systems, to preliminary clinical studies, and so on, the costs of the associated 
tests and analyses increases tremendously. Consequently, it may cost several tens of 
millions of dollars to determine that a once promising candidate compound possesses 
a side effect or cross reactivity that renders it commercially infeasible to develop 

30 further. A great challenge of pharmaceutical development is to remove from further 
consideration as early as possible those compounds that are likely to fail in the later 
stages of drug testing. 

Drug development programs are clearly structured with this objective in mind; 
however, rapidly escalating costs have created a need to develop even more stringent 

35 and less expensive screens in the early stages to identify false leads as soon as 

possible. Toxicity assessment is an area where such improvements may be made, for 
both drug development and for assessing the environmental, health, and safety effects 
of new compounds in general. 
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Typically the toxicity of a compound is determined by administering the 
compound to one or more species of test animal under controlled conditions and by 
monitoring the effects on a wide range of parameters. The parameters include such 
things as blood chemistry, weight gain or loss, a variety of behavioral patterns, muscle 
5 tone, body temperature, respiration rate, lethality, and the like, which collectively 
provide a measure of the state of health of the test animal. The degree of deviation of 
such parameters from their normal ranges gives a measure of the toxicity of a 
compound. Such tests may be designed to assess the acute, prolonged, or chronic 
toxicity of a compound. In general, acute tests involve administration of the test 

1 0 chemical on one occasion. The period of observation of the test animals may be as 
short as a few hours, although it is usually at least 24 hours and in some cases it may 
be as long as a week or more. In general, prolonged tests involve administration of 
the test chemical on multiple occasions. The test chemical may be administered one 
or more times each day, irregularly as when it is incorporated in the diet, at specific 

1 5 times such as during pregnancy, or in some cases regularly but only at weekly 

intervals. Also, in the prolonged test the experiment is usually conducted for not less 
than 90 days in the rat or mouse or a year in the dog. In contrast to the acute and 
prolonged types of test, the chronic toxicity tests are those in which the test chemical 
- is administered for a substantial portion of the lifetime of the test animal. In the case 

20 of the mouse or rat, this is a period of 2 to 3 years. In the case of the dog, it is for 5 to 
7 years. 

Significant costs are incurred in establishing and maintaining large cohorts of 
test animals for such assays, especially the larger animals in chronic toxicity assays. 
Moreover, because of species specific effects, passing such toxicity tests does not 

25 ensure that a compound is free of toxic effects when used in humans. Such tests do, 
however, provide a standardized set of information forjudging the safety of new 
compounds, and they provide a database for giving preliminary assessments of related 
compounds. An important area for improving toxicity determination would be the 
identification of new observables which are predictive of the outcome of the 

30 expensive and tedious animal assays. 

In other medical fields, there has been significant interest in applying recent 
advances in biotechnology, particularly in DNA sequencing, to the identification and 
study of differentially expressed genes in healthy and diseased organisms, e.g. Adams 
et al. Science, 252: 1651-1656 (1991); Matsubara et al. Gene, 135: 265-274 (1993); 

35 Rosenberg el al. International patent application, PCT/US95/01 863. The objectives 
of such applications include increasing our knowledge of disease processes, 
identifying genes that play important roles in the disease process, and providing 
diagnostic and therapeutic approaches that exploit the expressed genes or their 



-2- 



wo 97/13877 



PCT/US96/16342 



products. While such approaches are attractive, those based on exhaustive, or even 
sampled, sequencing of.expressed genes are still beset by the enormous effort 
required: h is estimated that 30-35 thousand different genes are expressed in a typical 
mammalian tissue in any given state, e.g. Ausubel et al, Editors, Current Protocols, 
5 5.8.1-5.8.4 (John Wiley & Sons, Newr York, 1992). Determining the sequences of 
even a small sample of that number of gene products is a major enterprise, requiring 
industrial-scale resources. Thus, the routine application of massive sequencing of 
expressed genes is still beyond current commercial technology. 

The availability of new assays for assessing the toxicity of compounds, such 
] 0 as candidate drugs, that would provide more comprehensive and precise information 
about the state of health of a test animal would be highly desirable. Such additional 
assays would preferably be less expensive, more rapid, and more convenient than 
current testing procedures, and would at the same time provide enough information to 
. make early judgments regarding the safety of new compounds. 

15 

Siunmarv of the Invention 
An object of the invention is to provide a new approach to toxicity assessment 
based on an examination of gene expression patterns, or profiles, in in vitro or in vivo 
. test systems. 

20 Another object of the invention is to provide a database on which to base 

decisions concerning the toxicological properties of chemicals, particularly drug 
candidates, 

A further object of the invention is to provide a method for analyzing gene 
expression patterns in selected tissues of test animals. 
25 A still further object of the invention is to provide a system for identifying 

genes which are differentially expressed in response to exposure to a test compound. 

Another object of the invention is to provide a rapid and reliable method for 
correlating gene expression with short term and long term toxicity in test animals. 

Another object of the invention is to identify genes whose expression is 
30 predictive of deleterious toxicity. 

The invention achieves these and other objects by providing a method for 
massively parallel signature sequencing of genes expressed in one or more selected 
tissues of an organism exposed to a test compound. An important feature of the 
invention is the application of novel DNA sorting and sequencing methodologies that 
35 permit the formation of gene expression profiles for selected tissues by determining 
the sequence of portions of many thousands of different polynucleotides in parallel. 
Such profiles may be compared with those from tissues of control organisms at single 
or multiple time points to identify expression patterns predictive of toxicity. 
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The sorting methodology of the invention makes use of oligonucleotide tags 
that are members of a minimally cross-hybridizing set of oligonucleotides. The 
sequences of oligonucleotides of such a set differ from the sequences of every other 
member of the same set by at least two nucleotides. Thus, each member of such a set 
cannot form a duplex (or triplex) v^ith the complement of any other member with less 
than two mismatches. Complements of oligonucleotide tags of the invention, referred 
to herein as '*tag complements," may comprise natural nucleotides or non-natural 
nucleotide analogs. Preferably, tag complements are attached to solid phase supports. 
Such oligonucleotide tags when used with their corresponding tag complements 
provide a means of enhancing specificity of hybridization for sorting polynucleotides, 
suchascDNAs. 

The polynucleotides to be sorted each have an oligonucleotide tag attached, 
such that different polynucleotides have different tags. As explained more fully 
below, this condition is achieved by employing a repertoire of tags substantially 
greater than the population of polynucleotides and by taking a sufficiently small 
sample of tagged polynucleotides from the fiill ensemble of tagged polynucleotides. 
After such sampling, when the populations of supports and polynucleotides are mixed 
under conditions which permit specific hybridization of the oligonucleotide tags with 
-their respective complements, identical polynucleotides sort onto particular beads or 
regions. The sorted populations of polynucleotides can then be sequenced on the 
solid phase support by a "single-base" or "base-by-base" sequencing methodology, as 
described more fiilly below. 

In one aspect, the method of the invention comprises the following steps: (a) 
administering the compound to a test organism; (b) extracting a population of mRNA 
molecules from each of one or more tissues of the test organism; (c) forming a 
separate population of cDNA molecules from each population of mRNA molecules 
extracted from the one or more tissues such that each cDNA molecule of the separate 
populations has an oligonucleotide tag attached, the oligonucleotide tags being 
selected from the same minimally cross-hybridizing set; (d) separately sampling each 
population of cDNA molecules such that substantially all different cDNA molecules 
within a separate population have different oligonucleotide tags attached; (e) sorting 
the cDNA molecules of each separate population by specifically hybridizing the 
oligonucleotide tags with their respective complements, the respective complements 
being attached as uniform populations of substantially identical complements in 
spatially discrete regions on one or more solid phase supports; (0 determining the 
nucleotide sequence of a portion of each of the sorted cDNA molecules of each 
separate population to form a frequency distribution of expressed genes for each of 
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the one or more tissues; and (g) correlating the frequency distribution of expressed 
genes in each of the one or more tissues with the toxicity of the compound. 

An important aspect of the invention is the identification of genes whose 
expression is predictive of the toxicity of a compound. Once such genes are 
5 identified, they may be employed in conventional assays, such as reverse transcriptase 
polymerase chain reaction (RT-PCR) assays for gene expression. 

Brief Description of the Drawings 
Figure 1 is a flow chart representation of an algorithm for generating 
1 0 minimally cross-hybridizing sets of oligonucleotides. 

Figure 2 diagrammatically illustrates an apparatus for carrying out 
polynucleotide sequencing in accordance with the invention. 

Definitions 

1 5 "Complement" or "tag complement" as used herein in reference to 

oligonucleotide tags refers to an oligonucleotide to which a oligonucleotide tag 
specifically hybridizes to form a perfectly matched duplex or triplex. In embodiments 
where specific hybridization results in a triplex, the oligonucleotide tag may be 
.selected to be either double stranded or single stranded. Thus, where triplexes are 

20 formed, the term "complement" is meant to encompass either a double stranded 

complement of a single stranded oligonucleotide tag or a single stranded complement 
of a double stranded oligonucleotide tag. 

The term "oligonucleotide" as used herein includes linear oligomers of natural 
or modified monomers or linkages, including deoxyribonucleosides, ribonucleosides, 

25 anomeric forms thereof, peptide nucleic acids (PNAs), and the like, capable of 
specifically binding to a target polynucleotide by way of a regulsu* pattern of 
monomer-to-monomer interactions, such as Watson-Crick type of base pairing, base 
stacking, Hoogsteen or reverse Hoogsteen types of base pairing, or the like. Usually 
monomers are linked by phosphodiester bonds or analogs thereof to form 

30 oligonucleotides ranging in size fi-om a few monomeric units, e.g. 3-4, to several tens 
of monomeric units. Whenever an oligonucleotide is represented by a sequence of 
leners, such as "ATGCCTG," it will be understood that the nucleotides are in 5'->3* 
order from left to right and that "A" denotes deoxyadenosine, "C" denotes 
deoxycytidine, "G" denotes deoxyguanosine, and "T" denotes thymidine, unless 

35 otherwise noted. Analogs of phosphodiester linkages include phosphorothioate, 
phosphorodithioate, phosphoranilidate, phosphoramidate, and the like. Usually 
oligonucleotides of the invention comprise the four natural nucleotides; however, they 
may also comprise non-natural nucleotide analogs. It is clear to those skilled in the 
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art when oligonucleotides having natural or non-natural nucleotides may be 
employed, e.g. where processing by enzymes is called for, usually oligonucleotides 
consisting of natural nucleotides are required. 

"Perfectly matched" in reference to a duplex means that the poly- or 
5 oligonucleotide strands making up the duplex form a double stranded structure with 
one other such that every nucleotide in each strand undergoes Watson-Crick 
basepairing with a nucleotide in the other strand. The term also comprehends the 
pairing of nucleoside analogs, such as deoxyinosine, nucleosides with 2-aminopurine 
bases, and the like, that may be employed. In reference to a triplex, the term means 
1 0 that the triplex consists of a perfectly matched duplex and a third strand in which 
every nucleotide undergoes Hoogsteen or reverse Hoogsteen association with a 
basepair of the perfectly matched duplex. Conversely, a "mismatch" in a duplex 
between a tag and an oligonucleotide means that a pair or triplet of nucleotides in the 
duplex or triplex fails to undergo Watson-Crick and/or Hoogsteen and/or reverse 
1 5 Hoogsteen bonding. 

As used herein, "nucleoside" includes the natural nucleosides, including 2'- 
deoxy and 2'-hydroxyl forms, e.g. as described in Komberg and Baker, DNA 
Replication, 2nd Ed. (Freeman, San Francisco, 1992). "Analogs" in reference to 
. nucjeosides includes synthetic nucleosides having modified base moieties and/or 
20 modified sugar moieties, e.g. described by Scheit, Nucleotide Analogs (John Wiley, 
New York, 1980); Uhlman and Peyman, Chemical Reviews, 90: 543-584 (1990), or 
the like, with the only proviso that they are capable of specific hybridization. Such 
analogs include synthetic nucleosides designed to enhance binding properties, reduce 
complexity, increase specificity, and the like. 
25 As used herein "sequence determination" or "determining a nucleotide 

sequence" in reference to polynucleotides includes determination of partial as well as 
full sequence information of the polynucleotide. That is, the term includes sequence 
comparisons, fingerprinting, and like levels of information about a target 
polynucleotide, as well as the express identification and ordering of nucleosides, 
30 usually each nucleoside, in a target polynucleotide. The term also includes the 

determination of the identification, ordering, and locations of one, two, or three of the 
four types of nucleotides within a target polynucleotide. For example, in some 
embodiments sequence determination may be effected by identifying the ordering and 
locations of a single type of nucleotide, e.g. cytosines, within the target polynucleotide 
35 "CATCGC ..." so that its sequence is represented as a binary code, e.g. "100101 ... " for 
"C.(not C)-(not C)-C-(not C)-C ... " and the like. 
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As used herein, the term "complexity" in reference to a population of 
polynucleotides means .the number of different species of molecule present in the 
population. 

As used herein, the terms "gene expression profile," and "gene expression 
5 pattern" which is used equivalently, means a frequency distribution of sequences of 
portions of cDNA molecules sampled from a population of tag-cDNA conjugates. 
Generally, the portions of sequence are sufficiently long to uniquely identify the 
cDNA from which the portion arose. Preferably, the total number of sequences 
determined is at least 1000; more preferably, the total number of sequences 
1 0 determined in a gene expression profile is at least ten thousand. 

As used herein, "test organism" means any in vitro or in vivo system which 
provides measureable responses to exposure to test compounds. Typically, test 
organisms may be mammalian cell cultures, particularly of specific tissues, such as 
hepatocytes, neurons, kidney cells, colony forming cells, or the like, or test organisms 
1 5 may be whole animals, such as rats, mice, hamsters, guinea pigs, dogs, cats, rabbits, 
pigs, monkeys, and the like. 

Detailed Description of the Invention 
The invention provides a method for determining the toxicity of a compound 

20 by analyzing changes in the gene expression profiles in selected tissues of test 
organisms exposed to the compound. The invention also provides a method of 
identifying toxicity markers consisting of individual genes or a group of genes that is 
expressed acutely and which is correlated with prolonged or chronic toxicity, or 
suggests that the compound will have an undesirable cross reactivit>'. Gene 

25 expression profiles are generated by sequencing portions of cDN A molecules 
construction from mRNA extracted from tissues of test organisms exposed to the 
compound being tested. As used herein, the term "tissue" is employed with its usual 
medical or biological meaning, except that in reference to an in vitro test system, such 
as a cell culture, it simply means a sample from the culture. Gene expression profiles 

30 derived from test organisms are compared to gene expression profiles derived from 
control organisms to determine the genes which are differentially expressed in the test 
organism because of exposure to the compound being tested. In both cases, the 
sequence information of the gene expression profiles is obtained by massively parallel 
signature sequencing of cDNAs, which is implemented in steps (c) through (f) of the 

35 above method. 

Toxicity Assessment 
Procedures for designing and conducting toxicity tests in in vitro and in vivo 
systems is well known, and is described in many texts on the subject, such as Loomis 
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et al, Loomis's Esstentials of Toxicology, 4th Ed. (Academic Press, New York, 1996); 
Echobichon, The Basics of Toxicity Testing (CRC Press, Boca Raton, 1992); Frazier, 
editor, in Vitro Toxicity Testing (Marcel Dekker, New York, 1992); and the like. 

In toxicity testing, two groups of test organisms are usually employed: one 
group serves as a control and the other group receives the test compound in a single 
dose (for acute toxicity tests) or a regimen of doses (for prolonged or chronic toxicity 
tests). Since in most cases, the extraction of tissue as called for in the method of the 
invention requires sacrificing the test animal, both the control group and the group 
receiving compound must be large enough to permit removal of animals for sampling 
tissues, if it is desired to observe the dynamics of gene expression through the 
duration of an experiment. 

In setting up a toxicity study, extensive guidance is provided in the literature 
for selecting the appropriate test organism for the compound being tested, route of 
administration, dose ranges, and the like. Water or physiological saline (0.9% NaCI 
in water) is the solute of choice for the test compound since these solvents permit 
administration by a variety of routes. When this is not possible because of solubility 
limitations, it is necessary to resort to the use of vegetable oils such as com oil or 
even organic solvents, of which propylene glycol is commonly used. Whenever 
.possible the use of suspension of emulsion should be avoided except for oral 
administration. Regardless of the route of administration, the volume required to 
administer a given dose is limited by the size of the animal that is used. It is desirable 
to keep the volume of each dose uniform within and between groups of animals. 
When rates or mice are used the volume administered by the oral route should not 
exceed 0.005 ml per gram of animal. Even when aqueous or physiological saline 
solutions are used for parenteral injection the volumes that are tolerated are limited, 
although such solutions are ordinarily thought of as being innocuous. The 
intravenous LD50 of distilled water in the mouse is approximately 0.044 ml per gram 
and that of isotonic saline is 0.068 ml per gram of mouse. 

When a compound is to be administered by inhalation, special techniques for 
generating test atmospheres are necessary. Dose estimation becomes very 
complicated. The methods usually involve aerosolization or nebulization of fluids 
containing the compound. If the agent to be tested is a fluid that has an appreciable 
vapor pressure, it may be administered by passing air through the solution under 
controlled temperature conditions. Under these conditions, dose is estimated from the 
volume of air inhaled per unit time, the temperature of the solution, and the vapor 
pressure of the agent involved. Gases are metered from reservoirs. When particles of 
a solution are to be administered, unless the particle size is less than about 2 fim the 
particles will not reach the terminal alveolar sacs in the lungs. A variety of 
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apparatuses and chambers are available to perform studies for detecting effects of 
irritant or other toxic endpoints when they are administered by inhalation. The 
preferred method of administering an agent to animals is via the oral route, either by 
intubation or by incorporating the agent in the feed. 
5 Preferably, in designing a toxicity assessment, two or more species should be 

employed that handle the test compound as similarly to man as possible in terms of 
metabolism, absorption, excretion, tissue storage, and the like. Preferably, multiple 
doses or regimens at different concentrations should be employed to establish a dose- 
response relationship with respect to toxic effects. And preferably, the route of 

1 0 administration to the test animal should be the same as, or as similar as possible to, 
the route of administration of the compound to man. Effects obtained by one route of 
administration to test animals are not a priori applicable to effects by another route of 
administration to man. For example, food additives for man should be tested by 
admixture of the material in the diet of the test animals. 

1 5 Acute toxicity tests consist of administering a compound to test organisms on 

one occasion. The purpose of such test is to determine the symptomotology 
consequent to administration of the compound and to determine the degree of lethality 
of the compound. The initial procedure is to perform a series of range-finding doses 
x>f the compound in a single species. This necessitates selection of a route of 

20 administration, preparation of the compound in a form suitable for administration by 
the selected route, and selection of an appropriate species. Preferably, initial acute 
toxicity studies are performed on either rats or mice because of their low cost, their 
availability, and the availability of abundant toxicologic reference data on these 
species. Prolonged toxicity tests consist of administering a compound to test 

25 organisms repeatedly, usually on a daily basis, over a period of 3 to 4 months. Two 
practical factors are encountered that place constraints on the design of such tests: 
First, the available routes of administration are limited because the route selected 
must be suitable for repeated administration without inducing harmful effects. And 
second, blood,- urine, and perhaps other samples, should be taken repeatedly without 

30 inducing significant harm to the test animals. Preferably, in the method of the 
invention the gene expression profiles are obtained in conjunction with the 
measurement of the traditional toxicologic parameters, such as listed in the table 
below: 
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Hematology 



Blood Chemistry 



Urine Analyses 



erythrocyte count 
total leukocyte count 
differential leukocyte 
count 
hematocrit 
hemoglobin 



sodium 
potassium 
chloride 

calcium 

carbon dioxide 

serum glutamine-pyruvate 

transaminase 

serum glutamin-oxalacetic 

transaminase 

serum protein 

electrophoresis 

blood sugar 

blood urea nitrogen 

total serum protein 

serum albumin 

total serum bilirubin 



pH 

specific gravity 
total protein 

sediment 

glucose 

ketones 

bilirubin 



5 ^ Oligonucleotide Tags and Ta£ Complements 

Oligonucleotide tags are members of a minimally cross-hybridizing set of 
oligonucleotides. The sequences of oligonucleotides of such a set differ from the 
sequences of every other member of the same set by at least two nucleotides. Thus, 
each member of such a set cannot form a duplex (or triplex) with the complement of 

1 0 any other member with less than two mismatches. Complements of oligonucleotide 
tags, referred to herein as "tag complements;' may comprise natural nucleotides or 
non-natural nucleotide analogs. Preferably, tag complements are attached to solid 
phase supports. Such oligonucleotide tags when used with their corresponding tag 
complements provide a means of enhancing specificity of hybridization for sorting, 

1 5 tracking, or labeling molecules, especially polynucleotides. 

Minimally cross-hybridizing sets of oligonucleotide tags and tag complements 
may be synthesized either combinatorially or individually depending on the size of the 
set desired and the degree to which cross-hybridization is sought to be minimized (or 
stated another way, the degree to which specificity is sought to be enhanced). For 

20 example, a minimally cross-hybridizing set may consist of a set of individually 

synthesized 10-mer sequences that differ from each other by at least 4 nucleotides, 
such set having a maximum size of 332 (when composed of 3 kinds of nucleotides 
and counted using a computer program such as disclosed in Appendix Ic). 
Alternatively, a minimally cross-hybridizing set of oligonucleotide tags may also be 
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assembled combinatorial ly from subunits which themselves are selected from a 
minimally cross-hybridizing set. For example, a set of minimally cross-hybridizing 
12-mers differing from one another by at least three nucleotides may be synthesized 
by assembling 3 subunits selected from a set of minimally cross-hybridizing 4-mers 
5 that each differ from one another by three nucleotides. Such an embodiment gives a 
maximally sized set of 9^, or 729, 12-mers. The number 9 is number of 
oligonucleotides listed by the computer program of Appendix la, which assumes, as 
with the 10-mers, that only 3 of the 4 different types of nucleotides are used. The set 
is described as "maximal" because the computer programs of Appendices la-c provide 

1 0 the largest set for a given input (e.g. length, composition, difference in number of 
nucleotides between members). Additional minimally cross-hybridizing sets may be 
formed from subsets of such calculated sets. 

Oligonucleotide tags may be single stranded and be designed for specific 
hybridization to single stranded tag complements by duplex formation or for specific 

1 5 hybridization to double stranded tag complements by triplex formation. 

Oligonucleotide tags may also be double stranded and be designed for specific 
hybridization to single stranded tag complements by triplex formation. 

When synthesized combinatorially, an oligonucleotide tag preferably consists 
.of a plurality of subunits, each subunit consisting of an oligonucleotide of 3 to 9 

20 nucleotides in length wherein each subunit is selected from the same minimally cross- 
hybridizing set. In such embodiments, the number of oligonucleotide tags available 
depends on the number of subunits per tag and on the length of the subunits. The 
number is generally much less than the number of all possible sequences the length of 
the tag, which for a tag n nucleotides long would be 4". 

25 Complements of oligonucleotide tags attached to a solid phase support are 

used to sort polynucleotides from a mixture of polynucleotides each containing a tag. 
Complements of the oligonucleotide tags are synthesized on the surface of a solid 
phase support, such as a microscopic bead or a specific location on an array of 
synthesis locations on a single support, such that populations of identical sequences 

30 are produced in specific regions. That is, the surface of each support, in the case of a 
bead, or of each region, in the case of an array, is derivatized by only one type of 
complement which has a particular sequence. The population of such beads or regions 
contains a repertoire of complements with distinct sequences. As used herein in 
reference to oligonucleotide tags and tag complements, the term ''repertoire" means 

35 the set of minimally cross-hybridizing set of oligonucleotides that make up the tags in 
a particular embodiment or the corresponding set of tag complements. 

The polynucleotides to be sorted each have an oligonucleotide tag attached, 
such that different polynucleotides have different tags. As explained more fully 
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10 



below, this condition is achieved by employing a repertoire of tags substantially 
greater than the population of polynucleotides and by taking a sufficiently small 
sample of tagged polynucleotides from the full ensemble of tagged polynucleotides. 
After such sampling, when the populations of supports and polynucleotides are mixed 
under conditions which permit specific hybridization of the oligonucleotide tags with 
their respective complements, identical polynucleotides sort onto particular beads or 
regions. 

The nucleotide sequences of oligonucleotides of a minimally cross-hybridizing 
set are conveniently enumerated by simple computer programs, such as those 
exemplified by programs whose source codes are listed in Appendices la and lb. 
Program minhx of Appendix la computes all minimally cross-hybridizing sets having 
4-mer subunits composed of three kinds of nucleotides. Program tagN of Appendix 
lb enumerates longer oligonucleotides of a minimally cross-hybridizing set. Similar 
algorithms and computer programs are readily written for listing oligonucleotides of 
1 5 minimally cross-hybridizing sets for any embodiment of the invention. Table I below 
provides guidance as to the size of sets of minimally cross-hybridizing 
oligonucleotides for the indicated lengths and number of nucleotide differences. The 
above computer programs were used to generate the numbers. 

20 Table I 

Nucleotide 
Difference 

between Maximal Size 

Oligonucleotides of Minimally Size of 

Oligonuclcolid of Minimally Cross- Repertoire Size of 

^ Cross- Hybridizing with Four Repertoire with 



Word Hybridizing Set Set Words 
Length 



Five Words 



4 


3 


9 


6561 


5.90 X lO"* 


6 


3 


27 


5.3 X lo'' 


1-43 X 10^ 


7 


4 


27 


5.3 X 10^ 


1.43 X 10^ 


7 


5 


8 


4096 


3.28 X lO** 


8 


3 


190 


1.30 X 10^ 


2.48 X 10*' 


8 


4 


62 


1.48 X 10^ 


9.I6X 10^ 


8 


5 


18 


1.05 X 10^ 


I.89x 10^ 


9 


5 


39 


2.31 X 10^ 


9.02 X 10' 


10 


5 


332 


1.21 xIO»^ 




10 


6 


28 


6.15 X 10^ 


1.72 x 10^ 


11 


5 


187 






18 


6 


^25000 
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18 12 24 

For some embodiments of the invention, where extremely large repertoires of 
tags are not required, oligonucleotide tags of a minimally cross-hybridizing set may 
be separately synthesized. Sets containing several hundred to several thousands, or 
5 even several tens of thousands, of oligonucleotides may be synthesized directly by a 
variety of parallel synthesis approaches, e.g. as disclosed in Frank et al, U.S. patent 
4,689,405; Frank et al. Nucleic Acids Research, 1 1 : 4365-4377 (1983); Matson et al. 
Anal. Biochem., 224: 110-1 16 (1995); Fodor et al. International application 
PCT/US93/04145; Pease et al, Proc. Natl. Acad. Sci., 91: 5022-5026 (1994); 

10 Southern et al, J. Biotechnology, 35: 217-227 (1994), Brennan, International 

application PCT/US94/05896; Lashkari et al, Proc. Natl. Acad. Sci., 92: 7912-7915 
(1995); or the like. 

Preferably, oligonucleotide tags of the invention are synthesized 
combinatorially out of subuhits between three and six nucleotides in length and 

1 5 selected from the same minimally cross-hybridizing set. For oligonucletides in this 
range, the members of such sets may be enumerated by computer programs based on 
the algorithm of Fig. 1 . 

The algorithm of Fig. 1 is implemented by first defining the characteristics of 
the subunits of the minimally cross-hybridizing set, i.e. length, number of base 

20 differences between members, and composition, e.g. do they consist of two, three, or 
four kinds of bases. A table M^, n=l, is generated (100) that consists of all possible 
sequences of a given length and composition. An initial subunil Sj is selected and 
compared (120) with successive subunits Sj for i=n+l to the end of the table. 
Whenever a successive subunit has the required number of mismatches to be a 

25 member of the minimally cross-hybridizing set, it is saved in a new table M^+i (125), 
that also contains subunits previously selected in prior passes through step 120. For 
example, in the first set of comparisons, M2 will contain Sj ; in the second set of. 
comparisons, M3 will contain S\ and S2; in the third set of comparisons, M4 will 
contain S 1 , S2, and S3; and so on. Similarly, comparisons in table Mj will be 

30 between Sj and all successive subunits in Mj. Note that each successive table M^+i 
is smaller than its predecessors as subunits are eliminated in successive passes 
through step 130. After every subunit of table has been compared (140) the old 
table is replaced by the new table M^+i , and the next round of comparisons are 
begun. The process stops (160) when a table is reached that contains no 

35 successive subunits to compare to the selected subunit Sj, i.e. Mn=Mn+i . 

Preferably, minimally cross-hybridizing sets comprise subunits that make 
approximately equivalent contributions to duplex stability as every other subunit in 
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the set. In this way, the stability of perfectly matched duplexes between every subunit 
and its complement is approximately equal. Guidance for selecting such sets is 
provided by published techniques for selecting optimal PCR primers and calculating 
duplex stabilities, e.g. Rychlik et al, Nucleic Acids Research, 17: 8543-8551 (1989) 
5 and 18: 6409-6412 (1990); Bresiauer et al, Proc. Natl. Acad. Sci., 83: 3746-3750 
(1986); Wetmur, Crit. Rev. Biochem. Mol. Biol., 26: 227-259 (1991);and the like. 
For shorter tags, e.g. about 30 nucleotides or less, the algorithm described by Rychlik 
and Wetmur is preferred, and for longer tags, e.g. about 30-35 nucleotides or greater, 
an algorithm disclosed by Suggs et al, pages 683-693 in Brown, editor, ICN-UCLA 
1 0 Symp. Dev. Biol,, Vol. 23 (Academic Press, New York, 1 98 1 ) may be conveniently 
employed. Clearly, the are many approaches available to one skilled in the art for 
designing sets of minimally cross-hybridizing subunits within the scope of the 
invention. For example, to minimize the affects of different base-stacking energies of 
terminal nucleotides when subunits are assembled, subunits may be provided that 
1 5 have the same terminal nucleotides. In this way, when subunits are linked, the sum of 
the base-stacking energies of all the adjoining terminal nucleotides will be the same, 
thereby reducing or eliminating variability in tag melting temperatures. 

A "word" of terminal nucleotides, shown in italic below, may also be added to 
. each end of a tag so that a perfect match is always formed between it and a similar 
20 terminal "word" on any other tag complement. Such an augmented tag would have 
the form: 



w 


w, 


W2 ... Wk., 




w 


w 


W,' 


W,' ... W,.,' 




w 



where the primed W's indicate complements. With ends of tags always forming 
perfectly matched duplexes, all mismatched words will be internal mismatches 
thereby reducing the stability of tag-complement duplexes that otherwise would have 
mismatched words at their ends. It is well known that duplexes with internal 
mismatches are significantly less stable than duplexes with the same mismatch at a 
terminus. 

A preferred embodiment of minimally cross-hybridizing sets are those whose 
subunits are made up of three of the four natural nucleotides. As will be discussed 
more fully below, the absence of one type of nucleotide in the oligonucleotide tags 
permits target polynucleotides to be loaded onto solid phase supports by use of the 
5*^3' exonuclease activity of a DNA polymerase. The following is an exemplary 
5 minimally cross-hybridizing set of subunits each comprising four nucleotides selected 
from the group consisting of A, G, and T: 
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Table II 

Word: wj W2 W3 W4 

Sequence: GATT TGAT TAGA TTTG 



Word: 

Sequence : 



W5 



W6 



W7 



Wg 



GTAA AGTA ATGT AAAG 



In this set, each member would form a duplex having three mismatched bases with 
1 0 the complement of every other member. 

Further exemplary minimally cross-hybridizing sets are listed below in Table 
III. Clearly, additional sets can be generated by substituting different groups of 
'nucleotides, or by using subsets of known minimally cross-hybridizing sets. 



15 

Table III 

Exemplary Minimally Cross-Hvbridizing Sets of 4-mer Subunits 



Set 1 


Set 2 


Set 3 


Set 4 


Set 5 


Set 6 


CATT 


ACCC 


AAAC 


AAAG 


AACA 


AACG 


CTAA 


AGGG 


ACCA 


ACCA 


ACAC 


ACAA 


TCAT 


CACG 


AGGG 


AGGC 


AGGG 


AGGC 


ACTA 


CCGA 


CACG 


CACC 


.CAAG 


CAAC 


TACA 


CGAC 


CCGC 


CCGG 


CCGC 


CCGG 


TTTC 


GAGC 


CGAA 


CGAA 


CGCA 


CGCA 


ATCT 


GCAG 


GAGA 


GAGA 


GAGA 


GAGA 


AAAC 


GGCA 


GCAG 


GCAC 


GCCG 


GCCC 




AAAA 


GGCC 


GGCG 


GGAC 


GGAG 
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Set 7 

AAGA 

ACAC 

AGCG 

CAAG 

CCCA 

CGGC 

GACC 

GCGG 

GGAA 



Set 8 

AAGC 

ACAA 

AGCG 

CAAG 

CCCC 

CGGA 

GACA 

GCGG 

GGAC 



Set 9 

AAGG 

ACAA 

AGCC 

CAAC 

CCCG 

CGGA 

GACA 

GCGC 

GGAG 



Set 10 
ACAG 
AACA 
AGGC 
CAAC 
CCGA 
CGCG 
GAGG 
GCCC 
GGAA 



Set 11 
ACCG 
AAAA 
AGGC 
CACC 
CCGA 
CGAG 
GAGG 
GCAC 
GGCA 
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Set 12 
ACGA 
AAAC 
AGCG 
CACA 
CCAG 
CGGC 
GAGG 
GCCC 
GGAA 



The oligonucleotide tags of the invention and their complements are 
conveniently synthesized on an automated DNA synthesizer, e.g. an Applied 
Biosystems, Inc. (Foster City, California) model 392 or 394 DNA/RNA Synthesizer, 
5 using standard chemistries, such as phosphoramidite chemistry, e.g. disclosed in the 
following references: Beaucage and Iyer, Tetrahedron, 48: 2223-23 11 (1 992); Moiko 
et al, U.S. patent 4,980,460; Koster et al, U.S. patent 4,725,677; Caruthers et al, U.S. 
patents 4,415,732; 4,458,066; and 4,973,679; and the like. Alternative chemistries, 
e.g. resulting in non-natural backbone groups, such as phosphorothioate, 

10 phosphoramidate, and the like, may also be employed provided that the resulting 
.oligonucleotides are capable of specific hybridization. In some embodiments, tags 
may comprise naturally occurring nucleotides that permit processing or manipulation 
by enzymes, while the corresponding tag complements may comprise non-natural 
nucleotide analogs, such as peptide nucleic acids, or like compounds, that promote the 

1 5 formation of more stable duplexes during sorting. 

When microparticles are used as supports, repertoires of oligonucleotide lags 
and tag complements may be generated by subunit-wise synthesis via "split and mix" 
techniques, e.g. as disclosed in Shortle et al Intemational patent application 
PCT/US93/03418 or Lyttle et aK Biotechniques, 19: 274-280 (1995). Briefly, the 

20 basic unit of the synthesis is a subunit of the oligonucleotide tag. Preferably, 
phosphoramidite chemistry is used and 3' phosphoramidite oligonucleotides are 
prepared for each subunit in a minimally cross-hybridizing set, e.g. for the set first 
listed above, there would be eight 4-mer 3'-phosphoramidites. Synthesis proceeds as 
disclosed by Shortle et al or in direct analogy with the techniques employed to 

25 generate diverse oligonucleotide libraries using nucleosidic monomers, e.g. as 

disclosed in Telenius et al, Genomics, 13: 718-725 (1992); Welsh et ah Nucleic Acids 
Research, 19: 5275-5279(1991); Grothues et al, Nucleic Acids Research, 21: 1321- 
1322 (1993); Hartley, European patent application 90304496.4; Lam et al, Nature, 
354: 82-84 (1991); Zuckerman et al. Int. J. Pept. Protein Research, 40: 498-507 

30 (1992); and the like. Generally, these techniques simply call for the application of 
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mixtures of the activated monomers to the growing oligonucleotide during the 
coupling steps. Preferably, oligonucleotide tags and tag complements are synthesized 
on a DNA synthesizer having a number of synthesis chambers which is greater than or 
equal to the number of different kinds of words used in the construction of the tags. 
5 That is, preferably there is a synthesis chamber corresponding to each type of word. 
In this embodiment, words are added nucleotide-by-nucleotide, such that if a word 
consists of five nucleotides there are five monomer couplings in each synthesis 
chamber. After a word is completely synthesized, the synthesis supports are removed 
from the chambers, mixed, and redistributed back to the chambers for the next cycle 

1 0 of word addition. This latter embodiment takes advantage of the high coupling yields 
of monomer addition, e.g. in phosphoramidite chemistries. 

Double stranded forms of tags may be made by separately synthesizing the 
complementary strands followed by mixing under conditions that permit duplex 
formation. Alternatively, double stranded tags may be formed by first synthesizing a 

1 5 single stranded repertoire linked to a known oligonucleotide sequence that serves as a 
primer binding site. The second strand is then synthesized by combining the single 
stranded repertoire with a primer and extending with a polymerase. This latter 
approach is described in Oliphant et al. Gene, 44: 177-183 (1986). Such duplex tags 
. may then be inserted into cloning vectors along with target polynucleotides for sorting 

20 and manipulation of the target polynucleotide in accordance with the invention. 

When tag complements are employed that are made up of nucleotides that 
have enhanced binding characteristics, such as PNAs or oligonucleotide N3'->P5' 
phosphoramidates, sorting can be implemented through the formation of D-loops 
between tags comprising natural nucleotides and their PNA or phosphoramidate 

25 complements, as an alternative to the '^stripping" reaction employing the 3'^5* 
exonuclease activity of a DNA polymerase to render a tag single stranded. 

Oligonucleotide tags of the invention may range in length from 12 to 60 
nucleotides or basepairs. Preferably, oligonucleotide tags range in length from 1 8 to 
40 nucleotides or basepairs. More preferably, oligonucleotide tags range in length 

30 from 25 to 40 nucleotides or basepairs. In terms of preferred and more preferred 
numbers of subunits, these ranges may be expressed as follows: 

Table IV 

Numbers of Subunits in Tags in Preferred Embodiments 



35 



Monomers 

in Subimit Nucleotides in Oligonucleotide Tag 

(12-60) (18-40) (25-40) 
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3 4-20 subunits 6-13 subunits 8-13 subunits 

4 3-15 subunits 4- 1 0 subunits 6- 1 0 subunits 

5 2-12 subunits 3-8 subunits 5-8 subunits 

6 2-10 subunits 3-6 subunits 4-6 subunits 



Most preferably, oligonucleotide tags are single stranded and specific hybridization 
occurs via Watson-Crick pairing with a tag complement. 

Preferably, repertoires of single stranded oligonucleotide tags of the invention 
5 contain at least 100 members; more preferably, repertoires of such tags contain at 
least 1000 members; and most preferably, repertoires of such tags contain at least 
10,000 members. 

Triplex Tags 

In embodiments where specific hybridization occurs via triplex formation, 

1 0 coding of tag sequences follows the same principles as for duplex-forming tags; 
however, there are further constraints on the selection of subunit sequences. 
Generally, third sU-and association via Hoogsteen type of binding is most stable along 
homopyrimidine-homopurine tracks in a double stranded target. Usually, base triplets 
form in T-A*T or C-G*C motifs (where indicates Watson-Crick pairing and 

1 5 indicates Hoogsteen type of binding); however, other motifs are also possible. For 
example, Hoogsteen base pairing permits parallel and antiparallel orientations 
between the third strand (the Hoogsteen strand) and the purine-rich strand of the 
duplex to which the third strand binds, depending on conditions and the composition 
of the strands. There is extensive guidance in the literature for selecting appropriate 

20 sequences, orientation, conditions, nucleoside type (e.g. whether ribose or 

deoxyribose nucleosides are employed), base modifications (e.g. methylated cytosine. 
and the like) in order to maximize, or otherwise regulate, triplex stability as desired in 
particular embodiments, e.g. Roberts et al, Proc. Natl. Acad. Sci., 88: 9397-9401 
(1991); Roberts et al, Science, 258: 1463-1466 (1992); Roberts et al, Proc. Natl. 

25 Acad. Sci., 93: 4320-4325 (1996); Distefano et al, Proc. Natl. Acad. Sci., 90: 1 179- 
1 183 (1993); Mergny et al, Biochemistry, 30: 9791-9798 (1991); Cheng et al, J. Am. 
Chem. Soc, 1 14: 4465-4474 (1992); Beal and Dervan, Nucleic Acids Research, 20: 
2773-2776 (1992); Beal and Dervan, J. Am. Chem, Soc, 1 14: 4976-4982 (1992); 
Giovannangeli et al, Proc. Natl. Acad. Sci., 89: 8631-8635 (1992); Moser and Dervan, 

30 Science, 238: 645-650 (1987); McShan et al, J. Biol. Chem., 267:5712-5721 (1992); 
Yoon et al, Proc. Natl. Acad. Sci., 89: 3840-3844 (1992); Blume et al. Nucleic Acids 
Research, 20: 1777-1784 (1992); Thuong and Helene, Angew. Chem. Int. Ed. Engl. 
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32: 666-690 (1993); Escude et al, Proc. Natl. Acad. Sci., 93: 4365-4369 (1996); and 
the like. Conditions for annealing single-stranded or duplex tags to their single- 
stranded or duplex complements are well known, e.g. Ji et al, Anal. Chem. 65: 1323- 
1328 (1993); Cantor et al, U.S. patent 5,482,836; and the like. Use of triplex tags has 
5 the advantage of not requiring a "stripping" reaction with polymerase to expose the 
tag for annealing to its complement. 

Preferably, oligonucleotide tags of the invention employing triplex 
hybridization are double stranded DNA and the corresponding tag complements are 
single stranded. More preferably, 5-methylcytosine is used in place of cytosine in the 

1 0 tag complements in order to broaden the range of pH stability of the triplex formed 
between a tag and its complement. Preferred conditions for forming triplexes are 
fully disclosed in the above references. Briefly, hybridization takes place in 
concentrated salt solution, e.g. 1 .0 M NaCl, 1 .0 M potassium acetate, or the like, at 
pH below 5.5 ( or 6.5 if 5-methylcytosine is employed). Hybridization temperature 

1 5 depends on the length and composition of the tag; however, for an 1 8-20-mer tag of 
longer, hybridization at room temperature is adequate. Washes may be conducted 
with less concentrated salt solutions, e.g. 10 mM sodium acetate, 100 mM MgCU, pH 
5.8, at room temperature. Tags may be eluted from their tag complements by 
- incubation in a similar salt solution at pH 9.0. 

20 Minimally cross-hybridizing sets of oligonucleotide tags that form triplexes 

may be generated by the computer program of Appendix Ic, or similar programs. An 
exemplary set of double stranded 8-mer words are listed below in capital letters with 
the corresponding complements in small letters. Each such word differs from each of 
the other words in the set by three base pairs. 

25 

Table V 

Exemplary Minimally Cross-Hvbridizing 
Set of DoubleStranded 8-mer Tags 



_/ 


-AAGGAGAG 


5' 


-AAAGGGGA 


5' 


-AGAGAAGA 


5' 


-AGGGGGGG 


3' 


-TTCCTCTC 


3' 


-TTTCCCCT 


3' 


-TCTCTTCT 


. 3' 


- TCCCCCCC 


3' 


- ttcctctc 


3' 


- tttcccct 


3' 


-tctcttct 


3' 


-tccccccc 




-AAAAAAAA 


5' 


-AAGAGAGA ' 


5' 


-AGGAAAAG 


5' 


-GAAAGGAG 


3' 




3' 


-TTCTCTCT 


3' 


-TCCTTTTC 


3' 


-CTTTCCTC 


3' 


-tttctttt 


3' 


-ttctctct 


3' 


-tccttttc 


3' 


-ctttcctc 


i; ' 


-AAAAAGGG 


5' 


-AGAAGAGG 


S' 


-AGGAAGGA 


5' 


-GAAGAAGG 


3' 


-TTTTTCCC 


3' 


-TCTTCTCC 


3' 


-TCCTTCCT 


3' 


-CTTCTTCC 


3' 


-tttttccc 


3' 


-tcttctcc 


3' 


-tcct t cct 


3' 


-cttcttcc 


5' 


-AAAGGAAG 


5' 


-AGAAGGAA 


5' 


-AGGGGAAA 


5' 


-GAAGAGAA 


3' 


-TTTCCTTC 


3' 


-TCTTCCTT 


3' 


-TCCCCTTT 


3' 


-CTTCTCTT 


3' 


-rrtccttc 


3' 


-tcttcctt 


3' 


-tccccttt 


3' 


-cttctctt 
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10 Table VI 

Repertoire Size of Various Double Stranded Tags 
That Form Triplexes with Their Tag Complements 



Oligonucleotid 
e 

Word 
Length 



Nucleotide 
Difference 
between 
Oligonucleotides 
ofMinimally 
Cross- 
Hybridizing Set 



Maximal Size 
ofMinimally 

Cross- 
Hybridizing 
Set 



Size of 
Repertoire 
with Four 

Words 



Size of 
Repertoire with 
Five Words 



4 


2 


8 


4096 


3.2 X lO"* 


6 


3 


8 


4096 


3.2 X lO"* 


8 


3 


16 


6.5 X lo"* 


1.05 X 10^ 


10. 


5 


8 


4096 




15 


5 


92 






20 


6 


765 






20 


8 


92 






20 


10 


22 







1 5 Preferably, repertoires of double stranded oligonucleotide tags of the invention 

contain at least 10 members; more preferably, repertoires of such tags contain at least 
1 00 members. Preferably, words are between 4 and 8 nucleotides in length for 
combinatorially synthesized double stranded oligonucletide tags, and oligonucleotide 
tags are between 1 2 and 60 base pairs in length. More preferably, such tags are 

20 between 1 8 and 40 base pairs in length. 

Solid Phase Supports 
Solid phase supports for use with the invention may have a wide variety of 
forms, including microparticles, beads, and membranes, slides, plates, micromachined 
25 chips, and the like. Likewise, solid phase supports of the invention may comprise a 
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wide variety of compositions, including glass, plastic, silicon, alkanethiolate- 
derivatized gold, cellulose, low cross-linked and high cross-linked polystyrene, silica 
gel. polyamide, and the like. Preferably, either a population of discrete particles are 
employed such that each has a uniform coating, or population, of complementary 
5 sequences of the same tag (and no other), or a single or a few supports are employed 
with spatially discrete regions each containing a uniform coating, or population, of 
complementary sequences to the same tag (and no other). In the latter embodiment, 
the area of the regions may vary according to particular applications; usually, the 
regions range in area from several nm2, e.g. 3-5. to several hundred ^m2, e.g. 100- 
1 0 500. Preferably, such regions are spatially discrete so that signals generated by 

events, e.g. fluorescent emissions, at adjacent regions can be resolved by the detection 
system being employed. In some applications, it may be desirable to have regions 
with uniform coatings of more than one tag complement, e.g. for simultaneous 
sequence analysis, or for bringing separately tagged molecules into close proximity. 
1 5 Tag complements may be used with the solid phase support that they are 

synthesized on, or they may be separately synthesized and attached to a solid phase 
support for use, e.g. as disclosed by Lund et al. Nucleic Acids Research, 16: 10861 - 
10880 (1988); Albretsen et al. Anal. Biochem., 189: 40-50 (1990); Wolf et al. Nucleic 
• Acids Research, 15: 291 1-2926 (1987); or Ghosh et al. Nucleic Acids Research, 15: 
20 5353-53 72 ( 1 987). Preferably, tag complements are synthesized on and used with the 
same solid phase support, which may comprise a variety of forms and include a 
variety of linking moieties. Such supports may comprise microparticles or arrays, or 
matrices, of regions where uniform populations of tag complements are synthesized. 
A wide variety of microparticle supports may be used with the invention, including 
!5 microparticles made of controlled pore glass (CPG), highly cross-linked polystyrene, 
acrylic copolymers, cellulose, nylon, dextran, latex, polyacrolein, and the like, 
disclosed in the following exemplary references: Meth. Enzymol., Section A, pages 
1 1-147, vol. 44 (Academic Press, New York, 1976); U.S. patents 4.678.814; 
4,413,070; and 4,046;720; and Pon, Chapter 19, in Agrawal, editor. Methods in 
0 Molecular Biology, Vol. 20, (Humana Press. Totowa, NJ, 1 993). Microparticle 
supports further include commercially available nucleoside-derivatized CPG and 
polystyrene beads (e.g. available from Applied Biosystems, Foster City, CA); 
derivatized magnetic beads; polystyrene grafted with polyethylene glycol (e.g.. 
TentaGelTM^ Rapp Polymere, Tubingen Germany); and the like. Selection of the 
5 support characteristics, such as material, porosity, size, shape, and the like, and the 
type of linking moiety employed depends on the conditions under which the tags are 
used. For example, in applications involving successive processing with enzymes, 
supports and linkers that minimize steric hindrance of the enzymes and that facilitate 
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access to substrate are preferred. Other important factors to be considered in selecting 
the most appropriate microparticle support include size uniformity, efficiency as a 
synthesis support, degree to which surface area known, and optical properties, e.g. as 
explain more fully below, clear smooth beads provide instrumentational advantages 

5 when handling large numbers of beads on a surface. 

Exemplary linking moieties for attaching and/or synthesizing tags on 
microparticle surfaces are disclosed in Pon et al, Biotechniques, 6:768-775 (1988); 
Webb, U.S. patent 4,659,774; Barany et al. International patent application 
PCT/US91/06103; Brown et al, J. Chem. Soc. Commun., 1989: 891-893; Damha et 

0 al. Nucleic Acids Research, 18: 3813-3821 (1990); Beattie et al. Clinical Chemistr>% 
39: 719-722 (1993); Maskos and Southern, Nucleic Acids Research, 20: 1679-1684 
(1992); and the like. 

As mentioned above, tag complements may also be synthesized on a single 
(or a few) solid phase support to form an array of regions uniformly coated with tag 

5 complements. That is, within each region in such an array the same tag complement 
is synthesized. Techniques for synthesizing such arrays are disclosed in McGall et al, 
International application PCT/US93/03767; Pease et al, Proc. Natl. Acad. Sci., 91 : 
5022-5026 (1994); Southern and Maskos, International application 
.PCT/GB89/01 1 14; Maskos and Southern (cited above); Southern et al. Genomics, 13: 

0 1 008- 1017(1 992); and Maskos and Southern, Nucleic Acids Research, 2 1 : 4663- 
4669(1993), 

Preferably, the invention is implemented with microparticles or beads 
uniformly coated with complements of the same tag sequence. Microparticle supports 
and methods of covalently or noncovalently linking oligonucleotides to their surfaces 

5 are well known, as exemplified by the following references: Beaucage and Iyer (cited 
above); Gait, editor. Oligonucleotide Synthesis: A Practical Approach (IRL Press, 
Oxford, 1984); and the references cited above. Generally, the size and shape of a 
microparticle is not critical; however, microparticles in the size range of a few, e.g. 1- 
2, to several hundred, e.g. 200-1000 |am diameter are preferable, as they facilitate the 

0 construction and manipulation of large repertoires of oligonucleotide tags with 
minimal reagent and sample usage. 

In some preferred applications, commercially available control led-pore glass 
(CPG) or polystyrene supports are employed as solid phase supports in the invention. 
Such supports come available with base-labile linkers and initial nucleosides attached. 

5 e.g. Applied Biosystems (Foster City, CA). Preferably, microparticles having pore 
size between 500 and 1000 angstroms are employed. 

In other preferred applications, non-porous microparticles are employed for 
their optical properties, which may be advantageously used when tracking large 



-22- 



wo 97/13877 



PCTAJS96/16342 



numbers of microparticles on planar supports, such as a microscope slide. 
Particularly preferred non-porous microparticles are the glycidal methacrylate (GMA) 
beads available from Bangs Laboratories (Carmel, IN). Such microparticles are 
useful in a variety of sizes and derivatized with a variety of linkage groups for 
5 synthesizing tags or tag complements. Preferably, for massively parallel 

manipulations of tagged microparticles, 5 ^un diameter GMA beads are employed. 



10 

Attaching Tags to Polynucleotides 
For Sorting onto Solid Phase Supports 
An important aspect of the invention is the sorting and attachment of a 
populations of polynucleotides, e.g. from a cDNA library, to microparticles or to 

1 5 separate regions on a solid phase support such that each microparticle or region has 
substantially only one kind of polynucleotide attached. This objective is 
acconiplished by insuring that substantially all different polynucleotides have 
different lags attached. This condition, in turn, is brought about by taking a sample of 
. the full ensemble of tag-polynucleotide conjugates for analysis. (It is acceptable that 

20 identical polynucleotides have different tags, as it merely results in the same 

polynucleotide being operated on or analyzed twice in two different locations.) Such 
sampling can be carried out either overtly--for example, by taking a small volume 
from a larger mixture-after the tags have been attached to the polynucleotides, it can 
be carried out inherently as a secondary effect of the techniques used to process the 

25 polynucleotides and tags, or sampling can be carried out both overtly and as an 
inherent part of processing steps. 

Preferably, in constructing a cDNA library where substantially all different 
cDNAs have different tags, a tag repertoire is employed whose complexity, or number 
of distinct tags, greatly exceeds the total number of mRNAs extracted from a cell or 

30 tissue sample. Preferably, the complexity of the tag repertoire is at least 1 0 times that 
of the polynucleotide population; and more preferably, the complexity of the tag 
repertoire is at least 100 times that of the polynucleotide population. Below, a 
protocol is disclosed for cDNA library construction using a primer mixture that 
contains a full repertoire of exemplary 9- word tags. Such a mixture of tag-containing 

35 primers has a complexity of 8^ or about 1 .34 x 1 0^ As indicated by Winslow et al. 
Nucleic Acids Research, 19: 3251-3253.(1991), mRNA for library construction can 
be extracted from as few as 10-100 mammalian cells. Since a single mammalian cell 
contains about 5 x10^ copies of mRNA molecules of about 3,4 x 10"* different kinds, 
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by standard techniques one can isolate the mRNA from about 100 cells, or 
(theoretically) about 5x10^ mRNA molecules. Comparing this number to the 
complexity of the primer mixture shows that without any additional steps, and even 
assuming that mRNAs are converted into cDNAs with perfect efficiency (1% 
5 efficiency or less is more accurate), the cDNA library construction protocol results in 
a population containing no more than 37% of the total number of different tags. That 
is, without any overt sampling step at all, the protocol inherently generates a sample 
that comprises 37%, or less, of die tag repertoire. The probability of obtaining a 
double under these conditions is about 5%, which is within the preferred range. With 
[ 0 mRNA from 1 0 cells, the fraction of the tag repertoire sampled is reduced to only 
3.7%, even assuming that all the processing steps take place at 100% efficiency. In 
fact, the efficiencies of the processing steps for constructing cDNA libraries are very 
low, a "rule of thumb" being that good library should contain about 10^ cDNA clones 
from mRNA extracted from 10^ mammalian cells. 
5 Use of larger amounts of mRNA in the above protocol, or for larger amounts 

of polynucleotides in general, where the number of such molecules exceeds the 
complexity of the lag repertoire, a tag-polynucleotide conjugate mixture potentially 
contains every possible pairing of tags and types of mRNA or polynucleotide. In such 
- cases, overt sampling may be implemented by removing a sample volume after a 
0 serial dilution of the starting mixture of tag-polynucleotide conjugates. The amount 
of dilution required depends on the amount of starting material and the efficiencies of 
the processing steps, which are readily estimated. 

If mRNA were extracted from 10^ cells (which would correspond to about 0.5 
|ig of poly(A)" RNA), and if primers were present in about 10-100 fold concentration 
5 excess~as is called for in a typical protocol, e.g. Sambrook et al, Molecular Cloning, 
Second Edition, page 8.61 [10 jaL 1.8 kb mRNA at 1 mg/mL equals about 1.68 x 10'" 
moles and 1 0 laL 1 8-mer primer at 1 mg/mL equals about 1 .68 x 1 0'^ moles], then the 
total number of tag-polynucleotide conjugates in a cDNA library would simply be 
equal to or less than the starting number of mRNAs, or about 5 x lO" vectors 
0 containing tag-polynucleotide conjugates-again this assumes that each step in cDNA 
construction-first strand synthesis, second strand synthesis, ligation into a vector- 
occurs with perfect efficiency, which is a very conservative estimate. The actual 
number is significantly less. 

If a sample of n tag-polynucleotide conjugates are randomly drawn from a 
5 reaction mixture-as could be effected by taking a sample volume, the probability of 
drawing conjugates having the same tag is described by the Poisson distribution, 
P(r)==e*^(>.)7r, where r is the number of conjugates having the same tag and X=np, 
where p is the probability of a given tag being selected. If n=l 0^ and p=l/(l .34 x 
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10*), then X=.00746 and P(2)=2.76 x ]0'\ Thus, a sample of one million molecules 
gives rise to an expected number of doubles well within the preferred range. Such a 
sample is readily obtained as follows: Assume that the 5 x 10*' mRNAs are perfectly 
converted into 5x10*' vectors with tag-cDNA conjugates as inserts and that the 5 x 
5 10*^ vectors are in a reaction solution having a volume of 100 jil. Four 1 0-fold serial 
dilutions may be carried out by transferring 10 \i\ from the original solution into a 
vessel containing 90 |al of an appropriate buffer, such as TE. This process may be 
repeated for three additional dilutions to obtain a 100 |il solution containing 5x10''' 
vector molecules per (il. A 2 aliquot from this solution yields 10^ vectors 

1 0 containing tag-cDNA conjugates as inserts. This sample is then amplified by straight 
forward transformation of a competent host cell followed by culturing. 

Of course, as mentioned above, no step in the above process proceeds with 
perfect efficiency. In particular, when vectors are employed to amplify a sample of 
tag-polynucleotide conjugates, the step of transforming a host is very inefficient. 

1 5 Usually, no more than 1% of the vectors are taken up by the host and replicated. 

Thus, for such a method of amplification, even fewer dilutions would be required to 
obtain a sample of 1 0^ conjugates. 

A repertoire of oligonucleotide tags can be conjugated to a population of 
* polynucleotides in a number of ways, including direct enzymatic ligation, 

20 amplification, e.g. via PCR, using primers containing the tag sequences, and the like. 
The initial ligating step produces a very large population of tag-polynucleotide 
conjugates such that a single tag is generally attached to many different 
polynucleotides. However, as noted above, by taking a sufficiently small sample of 
the conjugates, the probability of obtaining "doubles," i.e. the same tag on two 

25 different polynucleotides, can be made negligible. Generally, the larger the sample 
the greater the probability of obtaining a double. Thus, a design trade-off exists 
between selecting a large sample of tag-polynucleotide conjugates- which, for 
example, ensures adequate coverage of a target polynucleotide in a shotgun 
sequencing operation or adequate representation of a rapidly changing mRNA pool, 

oO and selecting a small sample which ensures that a minimal nimiber of doubles will be 
present. In most embodiments, the presence of doubles merely adds an additional 
source of noise or, in the case of sequencing, a minor complication in scanning and 
signal processing, as microparticles giving multiple fluorescent signals can simply be 
ignored. 

35 As used herein, the term "substantially all" in reference to attaching tags to 

molecules, especially polynucleotides, is meant to reflect the statistical nature of the 
sampling procedure employed to obtain a population of tag-molecule conjugates 
essentially free of doubles. The meaning of substantially all in terms of actual 
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percentages of tag-molecule conjugates depends on how the lags are being employed. 
Preferably, for nucleic acid sequencing, substantially all means that at least eighty 
percent of the polynucleotides have unique tags attached. More preferably, it means 
that at least ninety percent of the polynucleotides have unique tags attached. Still 
5 more preferably, it means that at least ninety-five percent of the polynucleotides have 
unique tags attached. And, most preferably, it means that at least ninety-nine percent 
of the polynucleotides have unique tags attached. 

Preferably, when the population of polynucleotides consists of messenger 
RNA (mRNA), oligonucleotides tags may be attached by reverse transcribing the 
10 mRNA with a set of primers preferably containing complements of tag sequences. 
An exemplary set of such primers could have the following sequence (SEQ ID NO: 
1): 

5' -mRNA- [A]n -3 ' 
15 [T] i9GG[W,W,W,C] QAC CAGCTG ATC-5'-blotin 



where "[W,W,W,C]9" represents the sequence of an oligonucleotide tag of nine 
- subunits of four nucleotides each and "[W,W,W,C]" represents the subunit sequences 
20 listed above, i.e. "W" represents T or A. The underlined sequences identify an 

optional restriction endonuclease site that can be used to release the polynucleotide 
from attachment to a solid phase support via the biotin, if one is employed. For the 
above primer, the complement attached to a microparticle could have the form: 

-5 5'- [G, W,W,W] 9TGG-linker-microparticle 

After reverse transcription, the niRNA is removed, e.g. by RNase H digestion, 
and the second strand of the cDNA is synthesized using, for example, a primer of the 
following form (SEQ ID NO: 2): 

30 

5 • -NRRGATCyNNN-3' 

where N is any one of A, T, G, or C; R is a purine-containing nucleotide, and Y is a 
pyrimidine-containing nucleotide. This particular primer creates a Bst Yl restriction 
35 site in the resulting double stranded DNA which, together with the Sal I site, 

facilitates cloning into a vector with, for example, Bam HI and Xho I sites. After Bsi 
Yl and Sal 1 digestion, the exemplary conjugate would have the form: 
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5'-RCGACCA[C,W,W,W]9GG[T)i9- cDNA -NNNR 

GGT[G,W,W,W]9CC[A]i9- rDNA -NNNYCTAG-5' 

The polynucleotide-tag conjugates may then be manipulated using standard molecular 
5 biology techniques. For example, the above conjugate-which is actually a mixture- 
may be inserted into commercially available cloning vectors, e.g. Stratagene Cloning 
System (La Jolla, CA); transfected into a host, such as a commercially available host 
bacteria; which is then cultured to increase the number of conjugates. The cloning 
vectors may then be isolated using standard techniques, e.g. Sambrook ei al, 

10 Molecular Cloning, Second Edition (Cold Spring Harbor Laboratory, New York, 
1989). Alternatively, appropriate adaptors and primers may be employed so that the 
conjugate population can be increased by PCR. 

Preferably, when the ligase-based method of sequencing is employed, the Bst 
Yl and Sal I digested fragments are cloned into a Bam HI-/Xho I-digested vector 

1 5 having the following single-copy restriction sites (SEQ ID NO: 3): 

5 ' -GA GGATG CCTTTAT GGATCC A CTCGAG ATCCCAATCCA- 3 ' 
Fokl BamHI Xhol 

20 

This adds the Fok I site which will allow initiation of the sequencing process 
discussed more fully below. 

Tags can be conjugated to cDNAs of existing libraries by standard cloning 
methods. cDNAs are excised from their existing vector, isolated, and then ligated into 

25 a vector containing a repertoire of tags. Preferably, the tag-containing vector is 

linearized by cleaving with two restriction enzymes so that the excised cDNAs can be 
ligated in a predetermined orientation. The concentration of the linearized tag- 
containing vector is in substantial excess over that of the cDNA inserts so that 
ligation provides an inherent sampling of tags. 

30 A general method for exposing the single stranded tag after amplification 

involves digesting a target polynucleotide-containing conjugate with the 5'— >3' 
exonuclease activity of T4 DNA polymerase, or a like enzyme. When used in the 
presence of a single deoxynucleoside triphosphate, such a polymerase will cleave 
nucleotides from 3' recessed ends present on the non-template strand of a double 

35 stranded fragment until a complement of the single deoxynucleoside triphosphate is 
reached on the template strand. When such a nucleotide is reached the 5'->3' 
digestion effectively ceases, as the polymerase's extension activity adds nucleotides at 
a higher rate than the excision activity removes nucleotides. Consequently, single 
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stranded tags constructed with three nucleotides are readily prepared for loading onto 
solid phase supports. . 

The technique may also be used to preferentially methylate interior Fok I sites 
of a target polynucleotide while leaving a single Fok I site at the terminus of the 
5 polynucleotide unmethylated. First, the terminal Fok I site is rendered single stranded 
using a polymerase with deoxycytidine triphosphate. The double stranded portion of 
the fragment is then methylated, after which the single stranded terminus is filled in 
with a DNA polymerase in the presence of all four nucleoside triphosphates, thereby 
regenerating the Fok I site. Clearly, this procedure can be generalized to 

1 0 endonucleases other than Fok I. 

After the oligonucleotide tags are prepared for specific hybridization, e.g. by 
rendering them single stranded as described above, the polynucleotides are mixed 
with microparticles containing the complementary sequences of the tags under 
conditions that favor the formation of perfectly matched duplexes between the tags 

1 5 and their complements. There is extensive guidance in the literature for creating these 
conditions. Exemplary references providing such guidance include Wetmur, Critical 
Reviews in Biochemistry and Molecular Biology, 26: 227-259 (1991); Sambrook el 
al. Molecular Cloning: A Laboratory Manual, 2nd Edition (Cold Spring Harbor 
* Laboratory, New York, 1989); and the like. Preferably, the hybridization conditions 

20 are sufficiently stringent so that only perfectly matched sequences form stable 

duplexes. Under such conditions the polynucleotides specifically hybridized through 
their tags may be ligated to the complementary sequences attached to the 
microparticles. Finally, the microparticles are washed to remove polynucleotides with 
unligated and/or mismatched tags. 

25 When CPG microparticles conventionally employed as synthesis supports are 

used, the density of tag complements on the microparticle surface is typically greater 
than that necessary for some sequencing operations. That is, in sequencing 
approaches that require successive treatment of the attached polynucleotides with a 
variety of enzymes, densely spaced polynucleotides may tend to inhibit access of the 

30 relatively bulky enzymes to the polynucleotides. In such cases, the polynucleotides 
are preferably mixed with the microparticles so that tag complements are present in 
significant excess, e.g. from 10: 1 to 100:1, or greater, over the polynucleotides. This 
ensures that the density of polynucleotides on the microparticle surface will not be so 
high as to inhibit enzyme access. Preferably, the average inter-polynucleotide spacing 

35 on the microparticle surface is on the order of 30-100 nm. Guidance in selecting 

ratios for standard CPG supports and Ballotini beads (a type of solid glass support) is 
found in Maskos and Southern, Nucleic Acids Research, 20: 1679-1684 (1992). 
Preferably, for sequencing applications, standard CPG beads of diameter in the range 
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of 20-50 are loaded with about 10^ polynucleotides, and GMA beads of diameter 
in the range of 5-10 |im are loaded with a few tens of thousand of polynucleotides, 
e.g. 4x 104to6x lO^.- 

In the preferred embodiment, tag complements are synthesized on 
5 microparticles combinatorially; thus, at the end of the synthesis, one obtains a 

complex mixture of microparticles from which a sample is taken for loading tagged 
polynucleotides. The size of the sample of microparticles will depend on several 
factors, including the size of the repertoire of tag complements, the nature of the 
apparatus for used for observing loaded microparticles--e,g. its capacity, the tolerance 
1 0 for multiple copies of microparticles with the same tag complement (i.e. "bead 
doubles"), and the like. The following table provide guidance regarding 
microparticle sample size, microparticle diameter, and the approximate physical 
dimensions of a packed array of microparticles of various diameters. 

15 

Microparticle diameter 5 10 20 40 nm 

Max. no. 

polynucleotides loaded 

at ] per |05sq. 3xl05 1.26x106 sx\0^ 

angstrom 

Approx. area of 
monolayer of 10^ 

microparticles .45 x .45 cm I x 1 cm 2 x 2 cm 4 x 4 cm 

20 The probability that the sample of microparticles contains a given tag complement or 
is present in multiple copies is described by the Poisson distribution, as indicated in 
the following table. 



Table Vll 
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High Specificity Sorting and Panning 
5 The kinetics of sorting depends on the rate of hybridization of oligonucleotide 

tags to their tag complements which, in turn, depends on the complexity of the tags in 
- the hybridization reaction. Thus, a trade off exists between sorting rate and tag 
complexity, such that an increase in sorting rate may be achieved at the cost of 
reducing the complexity of the tags involved in the hybridization reaction. As 

1 0 explained below, the effects of this trade off may be ameliorated by "panning." 

Specificity of the hybridizations may be increased by taking a sufficiently 
small sample so that both a high percentage of tags in the sample are unique and the 
nearest neighbors of substantially all the tags in a sample differ by at least two words. 
This latter condition may be met by taking a sample that contains a number of tag- 

1 5 polynucleotide conjugates that is about 0.1 percent or less of the size of the repertoire 
being employed. For example, if tags are constructed with eight words selected from 
Table II, a repertoire of 8^, or about 1 .67 x 1 0'^, tags and tag complements are 
produced. In a library of lag-cDNA conjugates as described above, a 0. 1 percent 
sample means that about 16,700 different tags are present. If this were loaded directly 

20 onto a repertoire-equivalent of microparticles, or in this example a sample of 1 .67 x 
10'^ microparticles, then only a sparse subset of the sampled microparticles would be 
loaded. The density of loaded microparticles can be increase-for example, for more 
efficient sequencing-by undertaking a "panning" step in which the sampled tag- 
cDNA conjugates are used to separate loaded microparticles from unloaded 

25 microparticles. Thus, in the example above, even though a "0.1 percent" sample 
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contains only 16,700 cDNAs, the sampling and panning steps may be repeated until 
as many loaded microparticles as desired are accumulated. 

A panning step may be implemented by providing a sample of tag-cDNA 
conjugates each of which contains a capture moiety at an end opposite, or distal to, 
5 the oligonucleotide tag. Preferably, the capture moiety is of a type which can be 
released from the tag-cDNA conjugates, so that the tag-cDNA conjugates can be 
sequenced with a single-base sequencing method. Such moieties may comprise 
biotin, digoxigenin, or like ligands, a triplex binding region, or the like. Preferably, 
such a capture moiety comprises a biotin component. Biotin may be attached to tag- 

1 0 cDNA conjugates by a number of standard techniques. If appropriate adapters 

containing PGR primer binding sites are attached to tag-cDNA conjugates, biotin may 
be attached by using a biotinylated primer in an amplification after sampling. 
Alternatively, if the tag-cDNA conjugates are inserts of cloning vectors, biotin may be 
attached after excising the tag-cDNA conjugates by digestion with an appropriate 

1 5 restriction enzyme followed by isolation and filling in a protruding strand distal to the 
tags with a DNA polymerase in the presence of biotinylated uridine triphosphate. 

After a tag-cDNA conjugate is captured, it may be released from the biotin 
moiety in a number of ways, such as by a chemical linkage that is cleaved by 
-reduction, e.g. Herman et al. Anal. Biochem., 156: 48-55 (1986), or that is cleaved 

20 photochemically, e.g. Olejnik et al. Nucleic Acids Research, 24: 361-366 (1996), or 
that is cleaved enzymatically by introducing a restriction site in the PGR primer. The 
latter embodiment can be exemplified by considering the library of tag-polynucleotide 
conjugates described above: 

25 5'-RCGACCA[C,W,W,W] 9GG[T]i9- cDNA -NNNR 

GGT[G,W,W,W]9CC[A]i9- rDNA -NNNYCTAG-5' 

The following adapters may be ligated to the ends of these fragments to permit 
amplification by PGR: 

5 • - xxxxxxxxxxxxxxxxxxxx 

XXXXXXXXXXXXXXXXXXXXYGAT 

35 Right Adapter 



GATCZZACTAGTZZZZZZZZZZZZ-3* 
40 ZZTGATCAZZZZZZZZZZZZ 
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Left Adapter 
ZZTGATCAZZZZZZZZZZZZ-5' -biotin 

5 

Left Primer 

where "ACTAGT" is a Spe I recognition site (which leaves a staggered cleavage 
ready for single base sequencing), and the X's and Zs are nucleotides selected so that 

0 the annealing and dissociation temperatures of the respective primers are 

approximately the same. After ligation of the adapters and amplification by PCR 
using the biotinylated primer, the tags of the conjugates are rendered single stranded 
by the exonuclease activity of T4 DNA polymerase and conjugates are combined with 
a sample of microparticles, e.g. a repertoire equivalent, with tag complements 

5 attached. After annealing under stringent conditions (to minimize mis-attachment of 
tags), the conjugates are preferably ligated to their tag complements and the loaded 
microparticles are separated from the unloaded microparticles by capture with 
avidinated magnetic beads, or like capture technique. 

Returning to the example, this process results in the accumulation of about 

0 1 0,500 (=1 6,700 X .63) loaded microparticles with different tags, which may be 
released from the magnetic beads by cleavage with Spe I. By repeating this process 
40-50 times with new samples of microparticles and tag-cDNA conjugates, 4-5 x 10^ 
cDNAs can be accumulated by pooling the released microparticles. The pooled 
microparticles may then be simultaneously sequenced by a single-base sequencing 

5 technique. 

Determining how many times to repeat the sampling and panning steps-or 
more generally, determining how many cDNAs to analyze, depends on one's 
objective. If the objective is to monitor the changes in abundance of relatively 
common sequences, e.g. making up 5% or more of a population, then relatively small 

0 samples, i.e. a small fraction of the total population size, may allow statistically 
significant estimates of relative abundances. On the other hand, if one seeks to 
monitor the abundances of rare sequences, e.g. making up 0.1% or less of a 
population, then large samples are required. Generally, there is a direct relationship 
between sample size and the reliability of the estimates of relative abundances based 

5 on the sample. There is extensive guidance in the literature on determining 

appropriate sample sizes for making reliable statistical estimates, e.g. KoUer et al, 
Nucleic Acids Research, 23:185-191 (1994); Good, Biometrika, 40: 16-264 (1953); 
Bunge et al, J. Am. Stat. Assoc., 88: 364-373 (1993); and the like. Preferably, for 
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monitoring changes in gene expression based on the analysis of a series of cDNA 
libraries containing lO^-to 10^ independent clones of 3.0-3.5 x 1(H different 
sequences, a sample of at least 10^ sequences are accumulated for analysis of each 
library. More preferably, a sample of at least 10^ sequences are accumulated for the 
5 analysis of each library; and most preferably, a sample of at least 5 x 1 0^ sequences 
are accumulated for the analysis of each library. Alternatively, the number of 
sequences sampled is preferably sufficient to estimate the relative abundance of a 
sequence present at a frequency within the range of 0.1% to 5% with a 95% 
confidence limit no larger than 0.1% of the population size. 

10 

Single Base DNA Sequencing 
The present invention can be employed with conventional methods of DNA 
sequencing, e.g. as disclosed by Hultman et al. Nucleic Acids Research, 17: 4937- 
4946 (1989). However, for parallel, or simultaneous, sequencing of multiple 

1 5 polynucleotides, a DNA sequencing methodology is preferred that requires neither 
electrophoretic separation of closely sized DNA fragments nor analysis of cleaved 
nucleotides by a separate analytical procedure, as in peptide sequencing. Preferably, 
the methodology permits the stepwise identification of nucleotides, usually one at a 
- time, in a sequence through successive cycles of treatment and detection. Such 

20 methodologies are referred to herein as "single base" sequencing methods. Single 
base approaches are disclosed in the following references: Cheeseman, U.S. patent 
5,302,509; Tsien et al. International application WO 91/06678; Rosenthal et al, 
International application WO 93/21340; Canard et al. Gene, 148: 1-6 (1994); and 
Metzker et al, Nucleic Acids Research, 22: 4259-4267 (1994). 

25 A "single base" method of DNA sequencing which is suitable for use with the 

present invention and which requires no electrophoretic separation of DNA fragments 
is described in International application PCT/US95/03678. Briefly, the method 
comprises the following steps: (a) ligating a probe to an end of the polynucleotide 
having a protruding strand to form a ligated complex, the probe having a 

30 complementary protruding strand to that of the polynucleotide and the probe having a 
nuclease recognition site; (b) removing unligated probe from the ligated complex; (c) 
identifying one or more nucleotides in the protruding strand of the polynucleotide by 
the identity of the ligated probe; (d) cleaving the ligated complex with a nuclease; and 
(e) repeating steps (a) through (d) until the nucleotide sequence of the polynucleotide. 

35 or a portion thereof, is determined. 

A single signal generating moiety, such as a single fluorescent dye, may be 
employed when sequencing several different target polynucleotides attached to 
different spatially addressable solid phase supports, such as fixed microparticles, in a 
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parallel sequencing operation. This may be accomplished by providing four sets of 
probes that are applied sequentially to the plurality of target polynucleotides on the 
different microparticles. An exemplary set of such probes are shown below: 



Set 1 



Set 2 



Set 3 



Set 4 



ANNNN . . . NN dANNNN . . . NN dANNNN . . . NN dANNNN . . . NN 

N. . .NNTT. . .T* d N...NNTT...T N.. .NNTT...T N...NNTT...T 

dCNNNN . . . NN CNNNN . . . NN dCNNNN . . . NN dCNNNN . . . NN 

N. . .NNTT. . .T N. . ,NNTT. . .T* N. . .NNTT. . .T N. . .NNTT. . .T 



dGNNNN . . . NN dGNNNN . . . NN GNNNN . . . NN 

N . . . NNTT . . . T N . . . NNTT . . . T N . . . NNTT . . . T* 



dGNNNN. . .NN 

N. . .NNTT. . .T 



dTNNNN . . . NN dTNNNN . . . NN dTNNNN . . . NN TNNNN . . . NN 

N. . .NNTT. . .T N. . . NNTT . . .T N. . .NNTT. . .T N. . .NNTT. , .T* 



where each of the listed probes represents a mixture of 4^=64 oligonucleotides such 
that the identity of the 3' terminal nucleotide of the top strand is fixed and the other 
positions in the protruding strand are filled by every 3-mer permutation of nucleotides, 
or complexity reducing analogs. The listed probes are also shown with a single 
stranded poly-T tail with a signal generating moiety attached to the terminal thymidine, 
shown as "T*". The "d" on the unlabeled probes designates a ligation-blocking moiety 
or absense of 3 '-hydroxy 1, which prevents unlabeled probes from being ligated. 
Preferably, such 3'-terminaI nucleotides are dideoxynucleotides. In this embodiment, 
the probes of set lare first applied to the plurality of target polynucleotides and treated 
with a ligase so that target polynucleotides having a thymidine complementary to the 3' 
terminal adenosine of the labeled probes are ligated. The unlabeled probes are 
simultaneously applied to minimize inappropriate ligations. The locations of the target 
polynucleotides that form ligated complexes with probes terminating in "A" are 
identified by the signal generated by the label carried on the probe. After washing and 
cleavage, the probes of set 2 are applied. In this case, target polynucleotides forming 
ligated complexes with probes terminating in "C" are identified by location. Similarly, 
the probes of sets 3 and 4 are applied and locations of positive signals identified. This 
process of sequentially applying the four sets of probes continues until the desired 
number of nucleotides are identified on the target polynucleotides. Clearly, one of 
ordinary skill could construct similar sets of probes that could have many variations, 
such as having protruding strands of different lengths, different moieties to block 
ligation of unlabeled probes, different means for labeling probes, and the like. 
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A pparatus for Sequencing Populations of Polynucleotides 
An objective ofthe invention is to sort identical molecules, particularly 
polynucleotides, onto the surfaces of microparticles by the specific hybridization of 
tags and their complements. Once such sorting has taken place, the presence ofthe 
5 molecules or operations performed on them can be detected in a number of ways 
depending on the nature of the tagged molecule, whether microparticles are detected 
separately or in "batches," whether repeated measurements are desired, and the like. 
Typically, the sorted molecules are exposed to ligands for binding, e.g. in drug 
development, or are subjected chemical or enzymatic processes, e.g. in polynucleotide 

1 0 sequencing. In both of these uses it is often desirable to simultaneously observe 

signals corresponding to such events or processes on large niunbers of microparticles. 
Microparticles carrying sorted molecules (referred to herein as "loaded" 
microparticles) lend themselves to such large scale parallel operations, e.g. as 
demonstrated by Lam et al (cited above). 

1 5 Preferably, whenever light-generating signals, e.g. chemiluminescent, 

fluorescent, or the like, are employed to detect events or processes, loaded 
micropanicles are spread on a planar substrate, e.g. a glass slide, for examination with 
a scanning system, such as described in International patent applications 
. PCT/US9 1/092 17, PCT/NL90/00081, and PCT/US95/01886. The scanning system 

20 should be able to reproducibly scan the substrate and to define the positions of each 
microparticle in a predetermined region by way of a coordinate system. In 
polynucleotide sequencing applications, it is important that the positional 
identification of microparticles be repeatable in successive scan steps. 

Such scanning systems may be constructed from commercially available 

25 components, e.g. x-y translation table controlled by a digital computer used with a 
detection system comprising one or more photomultiplier tubes, or altematively, a 
CCD array, and appropriate optics, e.g. for exciting, collecting, and sorting 
fluorescent signals. In some embodiments a confocal optical system may be 
desirable. An exemplary scanning system suitable for use in four-color sequencing is 

30 illustrated diagrammatically in Figure 5. Substrate 300, e.g. a microscope slide with 
fixed microparticles, is placed on x-y translation table 302, which is connected to and 
controlled by an appropriately programmed digital computer 304 which may be any of 
a variety of commercially available personal computers, e.g. 486-based machines or 
PowerPC model 7 1 00 or 8 1 00 available form Apple Computer (Cupertino, CA). 

35 Computer software for table translation and data collection functions can be provided 
by commercially available laboratory software, such as Lab Windows, available from 
National Instruments. 
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Substrate 300 and table 302 are operationally associated with microscope 306 
having one or more objective lenses 308 which are capable of collecting and 
delivering light to microparticles fixed to substrate 300. Excitation beam 3 1 0 from 
light source 312, which is preferably a laser, is directed to beam splitter 314, e.g. a 
5 dichroic mirror, which re-directs the beam through microscope 306 and objective lens 
308 which, in turn, focuses the beam onto subsUate 300. Lens 308 collects 
fluorescence 316 emitted from the microparticles and directs it through beam splitter 
314 to signal distribution optics 318 which, in turn, directs fluorescence to one or 
more suitable opto-electronic devices for converting some fluorescence characteristic. 

10 e.g. intensity, lifetime, or the like, to an electrical signal. Signal distribution optics 
3 1 8 may comprise a variety of components standard in the art, such as bandpass 
filters, fiber optics, rotating mirrors, fixed position mirrors and lenses, diffraction 
gratings, and the like. As illustrated in Figure 2, signal distribution optics 318 directs 
fluorescence 316 to four separate photomultiplier tubes, 330, 332, 334, and 336, 

1 5 whose output is then directed to pre-amps and photon counters 350, 352, 354, and 

356. The output of the photon counters is collected by computer 304, where it can be 
stored, analyzed, and viewed on video 360. Alternatively, signal distribution optics 
3 1 8 could be a diffraction grating which directs fluorescent signal 3 1 8 onto a CCD 
- array. 

20 The stability and reproducibility of the positional localization in scanning will 

determine, to a large extent, the resolution for separating closely spaced 
microparticles. Preferably, the scanning systems should be capable of resolving 
closely spaced microparticles, e.g. separated by a particle diameter or less. Thus, for 
most applications, e.g. using CPG microparticles, the scanning system should at least 

25 have the capability of resolving objects on the order of 1 0- 1 00 |im. Even higher 
resolution may be desirable in some embodiments, but with increase resolution, the 
time required to fully scan a substrate will increase; thus, in some embodiments a 
compromise may have to be made between speed and resolution. Increases in 
scanning time can be achieved by a system which only scans positions where 

30 microparticles are known to be located, e.g from an initial full scan. Preferably, 

microparticle size and scanning system resolution are selected to permit resolution of 
fluorescently labeled microparticles randomly disposed on a plane at a density 
between about ten thousand to one hundred thousand microparticles per cm^. 

In sequencing applications, loaded microparticles can be fixed to the surface 

35 of a substrate in variety of ways. The fixation should be strong enough to allow the 
microparticles to imdergo successive cycles of reagent exposure and washing without 
significant loss. When the substrate is glass, its surface may be derivatized with an 
alkylamino linker using commercially available reagents, e.g. Pierce Chemical, which 
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in turn may be cross-linked to avidin, again using conventional chemistries, to form 
an avidinated surface. Biotin moieties can be introduced to the loaded microparticles 
in a number of ways. For example, a fraction, e.g. 10-15 percent, of the cloning 
vectors used to attach tags to polynucleotides are engineered to contain a unique 
5 restriction site (providing sticky ends on digestion) immediately adjacent to the 
polynucleotide insert at an end of the polynucleotide opposite of the tag. The site is 
excised with the polynucleotide and tag for loading onto microparticles. After 
loading, about 10-15 percent of the loaded polynucleotides will possess the unique 
restriction site distal from the microparticle surface. After digestion with the 

1 0 associated restriction endonuclease, an appropriate double stranded adaptor 

containing a biotin moiety is ligated to the sticky end. The resulting microparticles 
are then spread on the avidinated glass surface where they become fixed via the 
biotin-avidin linkages. 

Alternatively and preferably when sequencing by ligation is employed, in the 

1 5 initial ligation step a mixture of probes is applied to the loaded microparticle: a 

fraction of the probes contain a type lis restriction recognition site, as required by the 
sequencing method, and a fraction of the probes have no such recognition site, but 
instead contain a biotin moiety at its non-ligating end. Preferably, the mixture 
* comprises about 10-15 percent of the biotinylated probe. 

20 In still another alternative, when DNA-loaded microparticles are applied to a 

glass substrate, the DNA may nonspecifically adsorb to the glass surface upon several 
hours, e.g. 24 hours, incubation to create a bond sufficiently strong to permit repeated 
exposures to reagents and washes without significant loss of microparticles. 
Preferably, such a glass substrate is a flow cell, which may comprise a channel etched 

25 in a glass slide. Preferably, such a channel is closed so that fluids may be pumped 
through it and has a depth sufficiently close to the diameter of the microparticles so 
that a monolayer of microparticles is trapped within a defined observation region. 

Identification of Novel Polynucleotides 
30 in cDNA Libraries 

Novel polynucleotides in a cDNA library can be identified by constructing a 
library of cDNA molecules attached to microparticles, as described above. A large 
fraction of the library, or even the entire library, can then be partially sequenced in 
parallel. After isolation of mRNA, and perhaps normalization of the population as 
35 taught by Soares et al, Proc. Natl. Acad. Sci., 91 : 9228-9232 (1994), or like 

references, the following primer may by hybridized to the polyA tails for first strand 
synthesis with a reverse transcriptase using conventional protocols (SEQ ID NO: 1): 
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5*-mRNA- [A]^ -3' 

[T) i9-[primer si te) -GG [W, W, W, C) gACCAGCTGATC-S ' 

where [W,W,W,C]9 represents a tag as described above, "ACCAGCTGATC" is an 
5 optional sequence forming a restriction site in double stranded form, and "primer site" 
is a sequence common to all members of the library that is later used as a primer 
binding site for amplifying polynucleotides of interest by PGR. 

After reverse transcription and second strand synthesis by conventional 
techniques, the double stranded fragments are inserted into a cloning vector as 

1 0 described above and amplified. The amplified library is then sampled and the sample 
amplified. The cloning vectors from the amplified sample are isolated, and the tagged 
cDNA fragments excised and purified. After rendering the tag single stranded with a 
polymerase as described above, the fragments are methylated and sorted onto 
microparticles in accordance with the invention. Preferably, as described above, the 

1 5 cloning vector is constructed so that the tagged cDNAs can be excised with an 

endonuclease, such as Fok I, that will allow immediate sequencing by the preferred 
single base method after sorting and ligation to microparticles. 

Stepwise sequencing is then carried out simultaneously on the whole library, 
or one or more large fractions of the library, in accordance with the invention until a 

20 'sufficient number of nucleotides are identified on each cDNA for unique 

representation in the genome of the organism from which the library is derived. For 
example, if the library is derived from mammalian mRNA then a randomly selected 
sequence 14-15 nucleotides long is expected to have unique representation among the 
2-3 thousand megabases of the typical mammalian genome. Of course identification 

25 of far fewer nucleotides would be sufficient for unique representation in a library 
derived from bacteria, or other lower organisms. Preferably, at least 20-30 
nucleotides are identified to ensure unique representation and to permit construction 
of a suitable primer as described below. The tabulated sequences may then be 
compared to known sequences to identify unique cDNAs. 

30 Unique cDNAs are then isolated by conventional techniques, e.g. constructing 

a probe from the PGR amplicon produced with primers directed to the prime site and 
the portion of the cDNA whose sequence was determined. The probe may then be 
used to identify the cDNA in a library using a conventional screening protocol. 

The above method for identifying new cDNAs may also be used to fingerprint 

35 mRNA populations, either in isolated measurements or in the context of a 
dynamically changing population. Partial sequence information is obtained 
simultaneously from a large sample, e.g. ten to a hundred thousand, or more, of 
cDNAs attached to separate microparticles as described in the above method. 
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Example 1 

Construction of a Tag Library 
An exemplary tag library is constructed as follows to form the chemically 
5 synthesized 9-word tags of nucleotides A, G, and T defined by the formula: 

3'-TGGC-[4(A,G,T)9]-CCCCp 

where "['*(A,G,T)9]" indicates a tag mixture where each tag consists of nine 4-mer 
1 0 words of A, G, and T; and "p" indicate a 5' phosphate. This mixture is ligated to the 
following right and left primer binding regions (SEQ ID NO: 4 and SEQ ID NO 5): 

5'- AGTGGCTGGGCATCGGACCG 5*- GGGGCCCAGTCAGCGTCGAT 

TCACCGACCCGTAGCCp GGGTCAGTCGCAGCTA 



15 



25 



30 



LEFT RIGHT 



The right and left primer binding regions are ligated to the above tag mixture, after 
which the single stranded portion of the ligated structure is filled with DNA 
20 'polymerase then mixed with the right and left primers indicated below^ and amplified 
to give a tag library (SEQ ID NO: 6). 



Left Primer 

5 ' - AGTGGCTGGGCATCGGACCG 



5'- AGTGGCTGGGCATCGGACCG- (A, G, T ) 9] -GGGGCCCAGTCAGCGTCGAT 
TCACCGACCCGTAGCCTGGC- (A,G,T) 9] -C CCCGGG TCAGT CGCAG CTA 

CCCCGGGTCAGTCGCAGCTA-5 ' 

Right Primer 

35 The underlined portion of the left primer binding region indicates a Rsr II recognition 
site. The left-most underlined region of the right primer binding region indicates 
recognition sites for Bsp 1201, Apa I, and Eco O 1091, and a cleavage site for Hga I. 
The right-most underlined region of the right primer binding region indicates the 
recognition site for Hga I. Optionally, the right or left primers may be synthesized 

40 with a biotin attached (using conventional reagents, e.g. available from Clontech 
Laboratories, Palo Alto, CA) to facilitate purification after amplification and/or 
cleavage. 
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NOT FURNISHED UPON FILING 
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primer binding site Ppu MI site 




-CAAATTTG-CCTAGG-AGAAGGAGAAGGAGAAGG- 
t T 

Bam HI site 

Pme 1 site 



15 

The plasmid is cleaved with Ppu MI and Pme I (to give a Rsr Il-compatible end and a 
flush end so that the insert is oriented) and then methylated with DAM methylase. 
The tag-containing construct is cleaved with Rsr II and then ligated to the open 
plasmid, after which the conjugate is cleaved with Mbo I and Bam HI to permit 
20 ligation and closing of the plasmid. The plasmid is then amplified and isolated and 
- used in accordance with the invention. 



Examples 

Changes in Gene Expression Profiles in Liver Tissue of Rats 

25 Exposed to Various Xenobiotic Agents 

In this experiment, to test the capability of the method of the invention to 
detect genes induced as a result of exposure to xenobiotic compounds, the gene 
expression profile of rat liver tissue is examined following administration of several 
compounds known to induce the expression of cytochrome P-450 isoenzymes. The 

30 results obtained from the method of the invention are compared to results obtained 
from reverse transcriptase PGR measurements and immunochemical measurements of 
the cytochrome P-450 isoenzymes. Protocols and materials for the latter assays are 
described in Morris et al. Biochemical Pharmacology, 52: 781-792 (1996). 

Male Sprague-Dawley rats between the ages of 6 and 8 weeks and weighing 

35 200-300 g are used, and food and water are available to the animals ad lib. Test 
compounds are phenobarbital (PB), metyrapone (MET), dexamethasone (DEX), 
clofibrate (CLO), com oil (CO), and P-naphthoflavone (BNF), and are available from 
Sigma Chemical Co. (St. Louis, MO). Antibodies against specific P-450 enzymes are 
available from the following sources: rabbit anti-rat CYP3A1 from Human Biologies, 

40 Inc. (Phoenix, AZ); goat anti-rat CYP4A1 from Daiichi Pure Chemicals Co. (Tokyo, 
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Japan); monoclonal mouse anti-rat C YPl A 1, monoclonal mouse anti-rat CYP2C1 1, 
goal anti-rat CYP2E 1 , and monoclonal mouse anti-rat C YP2B 1 from Oxford 
Biochemical Research, Inc. (Oxford, MI). Secondary antibodies (goat anti-rabbit IgG, 
rabbit anti-goat IgG and goat anti-mouse IgG) are available from Jackson 
5 ImmunoResearch Laboratories (West Grove, PA). 

Animals are administered either PB (100 mg/kg), BNF (100 mg/kg), MET 
(100 mg/kg), DEX (100 mg/kg), or CLO (250 mg/kg) for 4 consecutive days via 
intraperitoneal injection following a dosing regimen similar to that described by 
Wang et al. Arch. Biochem. Biophys. 290: 355-361 (1991). Animals treated with 
10 H2O and CO are used as controls. Two hours following the last injection (day 4), 
animals are killed, and the livers are removed. Livers are immediately frozen and 
stored at -TO^C. 

Total RNA is prepared from frozen liver tissue using a modification of the 
method described by Xie et al, Biotechniques, 1 1 : 326-327 (1 991 ). Approximately 
1 5 1 00-200 mg of liver tissue is homogenized in the RNA extraction buffer described by 
Xie et al to isolate total RNA. The resulting RNA is reconstituted in 
diethylpyrocarbonate-treated water, quantified spectrophotometrically at 260 nm, and 
adjusted to a concentration of 100 jig/ml. Total RNA is stored in 
' diethylpyrocarbonate-treated water for up to 1 year at -70^0 without any apparent 
20 degradation. RT-PCR and sequencing are performed on samples from these 
preparations. 

For sequencing, samples of RNA corresponding to about 0.5 |ig of poly(A)'^ 
RNA are used to construct libraries of tag-cDNA conjugates following the protocol 
described in the section entitled "Attaching Tags to Polynucleotides for Sorting onto 

25 Solid Phase Supports," with the following exception: the tag repertoire is constructed 
from six 4-nucleotide words from Table II. Thus, the complexity of the repertoire is 
8^ or about 2.6 x 10^. For each tag-cDNA conjugate library constructed, ten samples 
of about ten thousand clones are taken for amplification and sorting. Each of the 
amplified samples is separately applied to a fixed monolayer of about 10^ 10 jim 

30 diameter GMA beads containing tag complements. That is, the "sample" of tag 

complements in the GMA bead population on each monolayer is about four fold the 
total size of the repertoire, thus ensuring there is a high probability that each of the 
sampled tag-cDNA conjugates will find its tag complement on the monolayer. After 
the oligonucleotide tags of the amplified samples are rendered single stranded as 

35 described above, the tag-cDNA conjugates of the samples are separately applied to the 
monolayers under conditions that permit specific hybridization only between 
oligonucleotide tags and tag complements forming perfectly matched duplexes. 
Concentrations of the amplified samples and hybridization times are selected to 
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permit the loading of about 5 x 1 0"* to 2 x 10^ tag-cDNA conjugates on each bead 
where perfect matches occur. After ligation, 9-12 nucleotide portions of the attached 
cDNAs are determined in parallel by the single base sequencing technique described 
by Brenner in International patent application PCT/US95/03678. Frequency 
5 distributions for the gene expression profiles are assembled from the sequence 
information obtained from each of the ten samples. 

RT-PCRs of selected mRNAs corresponding to cytochrome P-450 genes and 
the constitutively expressed cyclophilin gene are carried out as described in Morris et 
al (cited above). Briefly, a 20 |iL reaction mixture is prepared containing Ix reverse 

1 0 transcriptase buffer (Gibco BRL), 10 nM dithiothreitol, 0.5 nM dNTPs, 2,5 jiM oligo 
d(T)|5 primer, 40 units RNasin (Promega, Madison, WI), 200 units RNase H-reverse 
transcriptase (Gibco BRL), and 400 ng of total RNA (in diethylpyrocarbonate-treated 
water). The reaction is incubated for 1 hour at 37^C followed by inactivation of the 
enzyme at 95^C for 5 min. The resulting cDNA is stored at -lO^C until used. For 

1 5 PCR amplification of cDNA, a 1 0 |xL reaction mixture is prepared containing 1 Ox 
polymerase reaction buffer, 2 mM MgCl2, 1 unit Taq DNA polymerase (Perkin- 
Elmer, Norwalk, CT), 20 ng cDNA, and 200 nM concentration of the 5' and 3' 
specific PCR primers of the sequences described in Morris et al (cited above). PCRs 
-are carried out in a Perkin-Elmer 9600 thermal cycler for 23 cycles using melting, 

20 annealing, and extension conditions of 94^C for 30 sec, 56*^C for 1 min., and 72^C 
for 1 min.. respectively. Amplified cDNA products are separated by.PAGE using 5% 
native gels. Bands are detected by staining with ethidium bromide. 

Western blots of the liver proteins are carried out using standard protocols 
after separation by SDS-PAGE. Briefly, proteins are separated on 10% SDS-PAGE 

25 gels under reducing conditions and immunoblotted for detection of P-450 isoenzymes 
using a modification of the methods described in Harris et al, Proc. Natl. Acad. Sci., 
88: 1407-1410 (1991). Protein are loaded at 50 ^g/lane and resolved under constant 
current (250 V) for approximately 4 hours at 2^C. Proteins are transferred to 
nitrocellulose membranes (Bio-Rad, Hercules, CA) in 15 mM Tris buffer containing 

30 120 mM glycine and 20% (v/v) methanol. The nitrocellulose membranes are blocked 
with 2.5% BSA and immunoblotted for P-450 isoenzymes using primary monoclonal 
and polyclonal antibodies and secondary alkaline phosphatase conjugated anti-IgG. 
Immunoblots are developed with the Bio-Rad alkaline phosphatase substrate kit. 
The three types of measurements of P-450 isoenzyme induction showed 

35 substantial agreement. 
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APPENDIX la 

Exemplary computer program for generating 
minimally cross hybridizing sets 
(single stranded tag/single stranded tag complement) 



Program minxh 

c 

c 

c 



c 
c 



c 
c 



c 
c 



integer* 2 subl (6) ,msetl (1000, 6} ,mset2 (1000, 6) 
dimension nbase(6) 



write (*,*) 'ENTER SUBUNIT LENGTH* 
read(*, 100)nsub 
100 format (il) 

open (1, f ile=' sub4 .dat ' , form=* formatted* , status= ' new* ) 



nset=0 

do 7000 ml=l, 3 
do 7000 m2=l, 3 

do 7000 m3^1,3' 
do 7000 m4 = l, 3 
subl (l)=ml 
subl (2)=m2 
subl ( 3)=m3 
subl (4 ) =m4 



ndiff=3 



c 
c 

c Generate set of subunits differing from 

c subl by at least ndiff nucleotides. 

c Save in msetl. 

c 

c 

jj = l' 

do 900 j-1, nsub ' 
900 msetl (1, j)=subl (j) 

c 
c 



do 1000 kl=l, 3 
do 1000 k2=l, 3 
do 1000 k3=l,3 
do 1000 k4=l,3 



nbase ( 1 ) =kl 
nbase(2)=k2 
nbase (3)=k3 
nbase (4 ) =k4 
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1200 

c 



n=0 

do 1200 j=l,nsub 
if (subl (j) .eq. 
subl ( j ) . eq . 
subJ ( j) .eq. 
n=n+l 
endif 
continue 



. and. 
. and. 
. and. 



nbase ( j ) 
nbase ( j ) 
nbase ( j ) 



. ne . 1 
.ne. 2 
.ne. 3) 



or . 
or . 
then 



if (n . ge . ndif f ) then 



c 
c 
c 
c 
c 
c 
c 



1100 

c 
c 

1000 
c 
'c 



1325 



I f number o f mismatches 
is greater than or equal 
to ndiff then record 
subunit in matrix mset 



do 1100 i=l,nsub 

msetl ( j j / i ) =nbase (i ) 
endif 



continue 



do 1325 j2=l,nsub 
mset2(l, j2}=msetl (1, j2) 
mset2{2, j2)=msetl (2, j2) 



c 
c 
c 

c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 

c 
c 

1700 



npass= 



continue 

ick=npass+2 

npass-npass+l 



Compa re s ubuni t 2 f r om 
msetl with each successive 
subunit in msetl, i.e. 3, 
4,5, ... etc. Save those 
with mismatches .ge. ndiff 
in matrix mset2 starting at 
position 2. 

Next transfer contents 
of mset2 into msetl and 
start 

comparisons again this time 
starting with subunit 3. 
Continue until all subunits 
undergo the comparisons. 
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1600 

1625 
1500 



do 1500 m=npass+2, j j 
n=0 

do 1600 j=l,nsub 

if (msetl (npass+1, j ) .eq. 1 . and.msetl (m, j ) .ne. 1 .or . 
msetl (npass+1, j ) .eq.2 . and.msetl (m, j ) . ne. 2 . or . 
msetl (npass+1, j } . eq. 3 . and.msetl (m, j ) .ne. 3) then 
n=n+l 
endif 
continue 
if (n.ge.ndif f ) then 
kk=kk+l 

do 1625 i=l,nsub 

mset2(kk,i)=msetl (m,i) 

endif 
continue 



c 
c 
c 
c 
c 
c 
c 



2000 



7009 

7008 
7010 



120 
7000 

c 
c 

c 
c 



kk is the number of subunits 
stored in mset2 

Transfer contents of mset2 
into msetl for next pass. 



do 2000 k=l,kk 

do 2000 m=l, nsub 

msetl (k,m)-mset2 (k,m) 
if (kk.it. jj) then 
jj=kk 
goto 1700 
endif 



nset=nset + 1 
write{l,7009) 

format { / ) 
do 7008 k=l, kk 

write (1, 7010) (msetl (k,m) ,m=l, nsub) 
format (4il) 
write ( * , * ) 

write {*, 120) kk, nset 

format (Ix, 'Subunits in set= * , i5, 2x, ' Set No=\i5) 
continue 
close ( 1 ) 



end 
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APPENDIX lb 

Exemplary computer program for generating 
minimally cross hybridizing sets 
(single stranded tag/single stranded tag complement) 



Program tagN 

c 
c 

c Program tagN generates minimally cross-hybridizing 

c sets of subunits given i) N--subunit length, and ii) 

c an initial subunit sequence. tagN assumes that only 

c 3 of the four natural nucleotides are used in the tags. 

c 

c 

character*l subl(20) 

integer*2 msec ( 10000, 20 ) , nbase(20) 

c 
c 

write {*,*) 'ENTER SUBUNIT LENGTH' 

read (*, 100) nsub 
100 format (12) 

c 

c • ; ■ 

wr ite ( • , * ) ' ENTER SUBUNIT- SEQUENCE ' 

read(-,110) ( subl ( k) , k=l , nsub) 
'no format (20al) 

c 
c 

ndiff=10 

c 

c Let a=l c=2 g=3 & t-4 



do 800 kk=l,nsub 

if (sub 1 (kk) .eq. 'a* ) then 

mset(l,kk)=l 

endif 

if (subl (kk) .eq. 'c' ) then 
mset(l,kk)=2 
endif 

if (subl (kk) .eq. 'g' ) then 
mset (1, kk) =3 
endif 

• • if (subl (kk) .eq. 't ' 1 then 

mset ( 1, kk) =4 
endif 

800 continue 

Q 
C 

c Generate set of subunits differing from 

c subl by at least ndiff nucleotides. 



jj = l 

c 
c 

do 1000 kl=l,3 
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do 1000 k2=l,3 
do 1000 k3=l,3 
dp 1000 k4=l,3 
do 1000 k5=l,3 
do 1000 k6=l, 3 
do 1000 k7=l,3 
do 1000 k8=l,3 
do 1000 k9=l,3 
do 1000 kl0=l,3 

do 1000 kll=l,3 
do 1000 kl2=l,3 
do 1000 kl3-l,3 
do 1000 kl4=l, 3 
do 1000 kl5=l,3 
do 1000 kl6=l, 3 
do 1000 kl7=l,3 
do 1000 kl8=l,3 
do 1000 kl9=l,3 
do" 1000 k20=l, 3 

c 
c 

nbase ( 1 ) =kl 

nbase{2)=k2 

nbase(3)=k3 

nbase ( 4 ) =k4 

nbase ( 5 ) =k5 

nbase { 6) =k6 

nbase(7)=k7 

nbase(8)=k8 

nbase (9) =k9 

nbase(10)=kl0 

nbase(li)=kll 

nbase(12)=kl2 ' 

nbase{13)=kl3 

nbase(14)=kl4 

nbase(15)=kl5 

nbase(16)=kl6 

nbase(17)=kl7 

nbase(18)=kl8 

nbase(19)=kl9 

nbase (20)=k20 



do 1250 nn=l, j j 



1200 

c 

c 



1250 
c 



n=0 

do 1200 j=l,nsub 
if (mset (nn, j ) , 
mset (nn, j ) , 
mset (nn, j ) , 
mset (nn, j ) , 
n=n+l 
endif 
continue 



if (n. It .ndif f ) 
goto 1000 
endif 

continue 



then 



eq.l .and. nbase ( j ). ne . 1 .or. 

eq.2 .and. nbase ( j ) . ne . 2 ..or. 

eq.3 .and. nbase ( j ) . ne . 3 .or. 

eq.4 .and. nbase ( j ) . ne . 4 ) then 



write(*, 130) (nbase(i) 
do 1100 i=l,nsub 



i = l , nsub) , j j 
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mset ( j j/ i)=nbase (i) 
1100 continue 

c 

1000 continue 

c 

c 

write ( * , * ) 
130 format {10x,20{lx,il),5x,i5) 

write (*, *) 

write(*,120) jj 
120 format ( Ix, ' Number of words=',i5) 

c 



c 
c 
c 



end 
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APPENDIX Ic 

Exemplary computer program for generating 
minimally cross hybridizing sets 
(double stranded tag/single stranded tag complement) 

Program 3tagN 

c 

c 

c Program 3tagN generates minimally cross-hybridizing 

c sets of duplex subunits given i) N--subunit length, 

c and ii) an initial homopurine sequence, 

c 

c 

character*l subl (20) 

integer*2 mset ( 10000, 20 ) , nbase{20} 

c 
c 

write (*,*) 'ENTER SUBUNIT LENGTH' 
read!*, 100)nsub 
100 format (12) 

c * ' . 

c 

write (*,*) 'ENTER SUBUNIT SEQUENCE a & g only* 

read(*, 110) (subl (k) , k=l,nsub) 
110 format {20al) 

c 

ndiff=10 

c 

c Let a=l and g=2 

do 800 kk=l,nsub 

if (subl (kk) .eq. 'a* ) then 

mset(l,kk)=l 

endif 

if (subl (kk) , eq. 'g ' ) then 
mset(l,kk)=2 
endif 

800 continue 

c 



jj = l 

do 1000 kl=l,3 
do 1000 k2-l,3 
do 1000 k3=l, 3 
do 1000 k4=l,3 
do 1000 k5=l,3 
do 1000 k6=l,3 
do 1000 k7=l, 3 
do 1000 k8-l,3 
do 1000 k9=l,3 
do 1000 klO=l, 3 

do 1000 kll=l,3 
do 1000 kl2=l, 3 
do 1000 kl3=l, 3 
do 1000 kl4 = l, 3 
do 1000 kl5=l, 3 
do 1000 kl6=l, 3 
do 1000 kl7=l,3 
do 1000 kl8=l, 3 



-50- 



wo 97/13877 



PCTAJS96/16342 



do 1000 kl9=l,3 
do 1000 k20=l,3 

'hbase (l)=^kl 
nbase{2)=k2 
nbase(3)=k3 
nbase (4 ) =k4 
nbase (5) -k5 
nbase(6)=k6 
nbase (7)=k7 
nbase(8)==k8 
nbase (9) =k9 
nbase (10)=klO 
nbase(ll)=kll 
nbase{12)=kl2 
nbase (13)=kl3 
nbase (14)=kl4 
nbase(15)=kI5 
nbase (16)=kl6 
nbase (17)-kl7 
nbase (18) =kl8 
nbase (19)=kl9 
nbase (20)=k20 

c 

do 1250 nn=l, j j 

c 

n=0 

do 1200 j=l,nsub 

if {mset {nn, j ) .eq. 1 .and. nbase ( j ). ne . 1 .or. 

1 mset (nn, j ) . eq . 2 .and. nbase ( j ) . ne . 2 . or. 

2 mset ( nn, j ) . eq . 3 .and. nbase ( j ) . ne . 3 .or. 

3 mset (nn, j } .eq . 4 .and. nbase ( j ) . ne . 4 ) then 
n=n+l 

endif 

1200 continue 
c 

if (n.lt .ndiff ) then 
goto 1000 
endif 

1250 continue 
c 

jj=jj+l 

write (* , 130) (nbase (i ) , i=l, nsub) , j j 
do 1100 i=l,nsub 

mset ( j j , i ) =nbase ( i ) 
1100 continue 
c 

1000 continue 
c 

write (* , * ) 
130 format (lOx, 20 ( Ix, il) , 5x, i5 ) 

write (*, *) 

write(*,120) jj 
120 forinat ( Ix, * Number of words=',i5) 



end 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 



(i) APPLICANT: David W. Martin, Jr. 



(ii) TITLE OF INVENTION: Measurement of Gene Expression profiles in 
Toxicity Determination 



(iii) NUMBER OF SEQUENCES: 7 



(iv) CORRESPONDENCE ADDRESS: 

{A) ADDRESSEE: Stephen C. Macevicz, Lynx Therapeutics, Inc. 

(B) STREET: 3832 Bay Center Place 

(C) CITY: Hayward 

(D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 94545 



(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: 3.5 inch diskette 

(B) COMPUTER: IBM compatible 

(C) OPERATING SYSTEM: Windows 3.1 

(D) SOFTWARE: Microso f t' Word ' 5 . 1 



(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PCt/US96/095 1 3 

(B) FILING DATE:. 06-JUN-96 



(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US95/12791 

(B) FILING DATE: 12-OCT-95 ' 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Stephen. C. Macevicz 

(B) REGISTRATION NUMBER: 30,285 

(C) REFERENCE/DOCKET NUMBER: 813wo 



(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (510) 670-9365 

(B) TELEFAX: (510) 670-9302 



(2) INFORMATION FOR SEQ ID NO: 1: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



CTAGTCGACC A 



(2) INFORMATION FOR SEQ ID NO: 2: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



NRRGATCYNN N 



(2} INFORMATION FOR SEQ ID NO: -3: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 nucleotides 

(B) - TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



GAGGATGCCT TTATGGATCC ACTCGAGATC CCAATCCA 



(2) INF0RM;i.TI0N for SEQ ID NO: 4: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



AGTGGCTGGG CATCGGACCG 



(2) INFORMATION FOR SEQ ID NO: 5: 



(i) SEQUENCE CHARACTERISTICS: 

. (A) LENGTH: 20 nucleotides 
(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



GGGGCCCAGT CAGCGTCGAT 



20 



(2) INFORMATION FOR SEQ ID NO: 6: 



:i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



ATCGACGCTG ACTGGGCCCC 



16 



(2) INF0R^4ATI0N FOR SEQ ID NO: 7; 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 62 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



AA.ZVAGGAGGA GGCCTTGATA GAGAGGACCT GTTTAAACGG ATCCTCTTCC 

1" T Q Q T' Q C^ 
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I claim: 

1 . A method of determining the toxicity of a compound, the method comprising 
the steps of: 

5 administering the compound to a test organism; 

extracting a population of mRNA molecules from each of one or more tissues 
of the test organism; 

forming a separate population of cDNA molecules from each population of 
mRNA molecules from the one or more tissues such that each cDNA molecule of a 
1 0 separate population has an oligonucleotide tag attached, the oligonucleotide tags 
being selected from the same minimally cross-hybridizing set; 

separately sampling each population of cDNA molecules such that 
substantially all different cDNA molecules within a separate population have different 
oligonucleotide tags attached; 
1 5 sorting the cDNA molecules of each separate population by specifically 

hybridizing the oligonucleotide tags with their respective complements, the respective 
complements being attached as uniform populations of substantially identical 
complements in spatially discrete regions on one or more solid phase supports; 

determining the nucleotide sequence of a portion of each of the sorted cDNA 
20 molecules of each separate population to form a frequency distribution of expressed 
genes for each of the one or more tissues; and 

correlating the frequency distribution of expressed genes in each of the one or 
more tissues with the toxicity of the compound. 

25 2. The method of claim 1 wherein said oligonucleotide tag and said complement 
of said oligonucleotide tag are single stranded. 

3. The method of claim 2 wherein said oligonucleotide tag consists of a plurality 
of subunits, each subunit consisting of an oligonucleotide of 3 to 9 nucleotides in 

30 length and each subunit being selected from the same minimally cross-hybridizing set. 

4. The method of claim 3 wherein said one or more solid phase supports are 
microparticles and wherein said step of sorting said cDNA molecules onto the 
microparticles produces a subpopulation of loaded microparticles and a subpopulation 

35 of unloaded microparticles. 

5. The method of claim 4 fiirther including a step of separating said loaded 
microparticles from said unloaded microparticles. 
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6. The method of claim 5 further including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is at least 10,000. 

5 

7. The method of claim 6 wherein said number of loaded microparticles is at 
least 100,000. 

8. The method of claim 7 wherein said number of loaded microparticles is at 
10 least 500,000. 

9. The method of claim 5 further including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is sufficient to estimate the relative abundance of a cDNA molecule 

1 5 present in said population at a frequency within the range of from 0. 1 % to 5% with a 
95% confidence limit no larger than 0. 1 % of said population. 

10. The method of claim 4 wherein said test organism is a mammalian tissue 
- culture. 

20 

1 1 . The method of claim 1 0 wherein said mammalian tissue culture comprises 
hepatocytes, 

12. The method of claim 4 wherein said test organism is an animal selected from 
25 the group consisting of rats, mice, hamsters, guinea pigs, rabbits, cats, dogs, pigs, and 

monkeys. 

13. The method of claim 12 wherein said one or more tissues are selected from the 
group consisting of liver, kidney, brain, cardiovascular, thyroid, spleen, adrenal, large 

30 intestine, small intestine, pancrease urinary bladder, stomach, ovary, testes, and 
mesenteric lymph nodes. 

14. A method of identifying genes which are differentially expressed in a selected 
35 tissue of a test animal after treatment v^th a compound, the method comprising the 

steps of 

administering the compound to a test animal; 
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extracting a population of mRNA molecules from the selected tissue of the 
lest animal; 

forming a population of cDNA molecules from the population of mRNA 
molecules such that each cDNA molecule has an oligonucleotide tag attached, the 
5 oligonucleotide tags being selected from the same minimally cross-hybridizing set; 

sampling the population of cDNA molecules such that substantially all 
different cDNA molecules have different oligonucleotide tags attached; 

sorting the cDNA molecules by specifically hybridizing the oligonucleotide 
tags with their respective complements, the respective complements being attached as 
1 0 uniform populations of substantially identical complements in spatially discrete 
regions on one or more solid phase supports; 

determining the nucleotide sequence of a portion of each of the sorted cDNA 
molecules to form a frequency distribution of expressed genes; and 

identijfying genes expressed in response to administering the compound by 
15 comparing the frequencing distribution of expressed genes of the selected tissue of the 
test animal with a frequency distribution of expressed genes of the selected tissue of a 
control animal. 

- 15. The method of claim 1 4 wherein said oligonucleotide tag and said 
20 complement of said oligonucleotide tag are single stranded. 

1 6. The method of claim 1 5 wherein said oligonucleotide tag consists of a 
plurality of subunits, each subunit consisting of an oligonucleotide of 3 to 9 
nucleotides in length and each subunit being selected from the same minimally cross- 

25 hybridizing set. 

1 7. The method of claim 1 6 wherein said one or more solid phase supports are 
microparticles and wherein said step of sorting said cDNA molecules onto the 
microparticles produces a subpopulation of loaded microparticles and a subpopulation 

30 of unloaded microparticles. 

1 8. The method of claim 1 7 further including a step of separating said loaded 
microparticles from said unloaded microparticles. 

35 19. The method of claim 1 8 further including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is at least 10,000. 
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20. The method of claim 19 wherein said number of loaded microparticles is at 
least 100.000. 

2 1 . The method of claim 20 wherein said number of loaded microparticles is at 
5 least 500,000. 

22. The method of claim 1 8 further including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is sufficient to estimate the relative abimdance of a cDNA molecule 

1 0 present in said population at a frequency within the range of from 0. 1 % to 5% with a 
95% confidence limit no larger than 0. 1% of said population. 

23. The method of claim 17 wherein said test animal is selected from the group 
consisting of rats, mice, hamsters, guinea pigs, rabbits, cats, dogs, pigs, and monkeys. 

15 

24. The method of claim 23 wherein said selected tissue is selected from the 
group consisting of liver, kidney, brain, cardiovascular, thyroid, spleen, adrenal, large 
intestine, small intestine, pancrease urinary bladder, stomach, ovary, testes, and 

- mesenteric lymph nodes. 
20 ' 

25. A use of the technique of massively parallel signature sequencing to determine 
the toxicity of a compound in a test organism, the use comprising the steps of: 

administering the compound to a test organism; 

extracting a population of mRNA molecules from each of one or more tissues 
25 of the test organism and forming a population of cDNA molecules for each of the one 
or more tissues; 

determining the nucleotide sequence of a portion of each of the cDNA 
molecules of each separate population using massively. parallel signature sequencing 
to form a frequency distribution of expressed genes for each of the one or more 
30 tissues; and 

correlating the frequency distribution of expressed genes in each of the one or 
more tissues with the toxicity of the compound. 

26. The use of claim 25 wherein said test organism is a mammalian tissue culture. 

35 

27. The use of claim 26 wherein said mammalian tissue culture comprises 
hepatocytes. 
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28. The use of claim 25 wherein said test organism is an animal selected from the 
group consisting of rats, mice, hamsters, guinea pigs, rabbits, cats, dogs, pigs, and 
monkeys. 

5 29, The use of claim 28 wherein said one or more tissues are selected from the 
group consisting of liver, kidney, brain, cardiovascular, thyroid, spleen, adrenal, large 
intestine, small intestine, pancrease urinary bladder, stomach, ovary, testes, and 
mesenteric lymph nodes. 

10 30. A use of the technique of massively parallel signature sequencing to identify 
genes which are differentially expressed in a test organism after treatment with a 
compound and which are correlated with toxicity of the compound, the use 
comprising the steps of: 

administering the compound to the test organism; 
1 5 extracting a population of mRNA molecules from a selected tissue of the test 

organism and forming a population of cDNA molecules; 

determining the nucleotide sequence of a portion of each of the cDNA 
molecules using massively parallel signature sequencing to form a frequency 
' distribution of expressed genes; 
20 identifying genes expressed in response to administering the compound by 

comparing the frequencing distribution of expressed genes of the selected tissue of the 
test organism with a frequency distribution of expressed genes of the selected tissue 
of a control organism; and 

determining whether the genes expressed in response to administering the 
25 compound are correlated with toxicity of the compound in the test organism. 
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August 11, 1997. Monday 

SpECnON: Financial News 

DISTRIBUTION: TO BUSINESS AND MEDICAL EDITORS 
LENGTH: 478 words 

HEADLINE: Eli Lilly & Co. and Acacia Biosciences Enter Into Research Collaboration; 
First Corporate Agreement for Acacia's Genome Reponer Matrix(TM) 

DATELINE: RICHMOND, Calif., Aug. 11 

BODY: 

Acada Biosciences and Eli Lilly and Cong)any (Lilly) annoimced today die signing of a joint research collaboration 
to utilize Acacia*s Genome Reporter Matrix(TM) (GRM) to aid in the selection and optimization of lead conqwunds. 
Under the collaboration, Acacia will provide chemical and biological profiles on a class of Lilly's compounds for an 
undisclosed fee. 

Acacia's GRM is an assay-based conq)uter modeling system thai uses yeast as a miniature ecosystem. The GRM 
can profile the extent, nature and quantity of any changes in gene expression. Because of the similarities between 
the yeast and human genome, the system serves as an excellent surrogate for the human body, mimicking the effects 
induced by a biologically active molecule. 

" Using yeast as a model organism for lead optimization makes a lot of sense given the high degree of homology with 
human metabolic pathways." said William Current of Lilly Research Laboratories, "Acacia's innovative GRM has 
the potential to provide enormous insight into the theraq)eutic intact of our compounds and make the drug discovery 
process more rational. It should substantially accelerate the development process. " 

"This first agreement with a major pharmaceutical company is an important milestone in the development of 
Acacia." said Bruce Cohen. President and CEO of Acacia. "The deal is in line with our strategy of establishing 
alliances that will allow our collaboraton to use genomic profiles to identify and optimize compounds within 
their existing portfolios. In the long run. this technology can be used to characterize large scale combinatorial 
libraries, predict side effects prior to clinical trials and resurrect drugs that have failed during clinical trials." 

The GRM incorporates two critical elements: chemical response profiles and genetic response profiles. The 
chemical response profiles measure the change in gene expression caused by potential therapeutics and then rank genes 
with altered expressions by degree of response. The genetic response profiles measure changes in gene expression 
caused by mutations in the genes encoding potential targets of pharmaceuticals; these genetic response profiles represent 
gold standards in drug discovery by defining the response profile expected for drugs with perfect selectivity and 
specificity. By comparing the two profiles, one can analyze a potential drug candidate's ability to mimic the action of 
a 'perfect' drug. 

Acacia Biosciences is a fimctional genomics company developing proprietary technologies to enhance the speed 
and efficacy of drug discovery and development. Acacia's Genome Reporter Matrix capitalizes on the latest advances 
in genomics and combinatorial chemistry to generate comprehensive profiles of drug candidates' in vivo activity. 
SOURCE Acacia Biosciences 

CONTACT: Bruce Cohen, President and CEO of Acada Biosciences, 510-669-2330 ext. 103 or Media: Linda 
Seaton of Feinsiein 
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Pharmagene 
Raises More 
Capital for 
Research on 
Human 
Tissues 

By Sophia Fox 

Pharmagene, the Royston. 
lllC-b^cd biopharmaceuti- 
cal company specialising in 
ihe use of human biomaterials for 
drug discovery fesearch. has raised a 
fiirmer £S million fh>m a group of 
investors led fay 3i and Abacus 
Nominees The funding will enable 
the company to expand both its 
human biomaterials cotlection and 
its capabilities across a range of pro- 
prietary platform technologies. 

Gordon Baxter. Ph.D.« 
Pharmagene^ cofcmnder and chief 
operating ofTicer, clahned. "by the 
end of this year Phannagcne will 
have access to the largest collection 
of hunwn RNAs and proteins arry- 
vAien in the world, and a range of 
innovative, yet robust technologies 
SEE PHARMAGENE. P. ft 



Perkin-Elmer Acquires PerSeptive to Expand 
Its Capabilities in ISene-BaseclDrug Discovery 



By iohn Sterling 

Perldn-Elraer'k (PE; Norwalk. 
CT) decision last month to 
acquire PerSeptive Blo- 
systnns (Fiamingham. MA) via a 
S360 million stock swap was 
deigned to streri^then P£ in terms 
of broad capabilities in gene-based 
drug discovery. The company^ 
main goal is to devriop new prod- 
ucts to inunove the integration of 
genetic and protdn research. 

*Thfa merger will enhance our 
position as an cfTective provider of 
innovative, integrated platforms 
enabling our customers to be more 
efficient and cost-effective in bring- 
ing new pharmaceuticals to mar- 
ket," says Tony L. White, PE^ 
chairman, president and CEO. The 
combination of our two commies 
should bolster our prcsciKe in the 
life sciences, [and it is our] belief 
that we must take bold action now 
to k»d the emerging era of molecu- 
lar medicine with leading positions 
in both genetic and proiem analy- 
sis." 

A driving force behind the 
merger is the vast amount of genet- 



FDA OKs Genome's Carticel 
Product for Dam^ to Knees 



.1^ 



I — Periosteal flAp — . 



Biopsy j 




Gfinzyme Titiuc Ropair 



Cell Processing 



CarticeK which ytmcj^pnjvedfor the repair o/ctinic^ 
tomatic cartilaginous defects tf the Jemoml condyle (medial, lateral or 
tnxhlear) caused by acute or npetitive trauma, emfAcys a prvpnetary 
pmcexx to gnrw au^ogma cartilage cellfjor implantatifm, 



By Naomi PfdfTer 

The FDA has approved a knee- 
cartilage replaoement product 
made by Gcazyme Tissue 
Repab (Cambridge, MAX a track- 
ing-stock division of Genzyroe 
Corp., for people with trauma- 
damaged knees, 

Carticel" (autokigous cultured 
chondrocytes) is the first product to 
be licensed under the FDA^ pro- 
SEEaEMZYMe,p.e 
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ic information about human dis- 
ease that is being accumulated by 
researchers and biotcch companies 
working in the area of gerunnics. It 
is becoming mcreasingly obvkxis 
that these data need to be comple- 
mented with technologies for 



studying proteins and protein net- 
works — a field known as pro- 
tcomics (xiv GEN, September /. 
1997. pl\ 

PE ofTicials, who claim that 
MALDI-TOF (Matrix Assisted 
SEE ACQUtsmON, P. 10 



Strategies for Target Validation 
Streamnne Evaluation of Leads 



ByVkki Glaser 

A cada BkMctences (Rich- 
/\mond CA) last month 
Xmannounccd its first agree- 
ment with a major pharmaceutical 
company, signing a deal with En 
Ully (Indianapolis, IN) to use 
Acada^ Genome Reporter Matrix 
(CRM) to select and optimize some 
of LilN^ lead compounds. Acacia^ 
yeast-bttwd system for profiling 
drug activity is useful for evaluating 
the therapeutic potential of lead 
oonmounds, and it also has a role in 
the identification and validation of 
new drug targets. 

"\Mc>c using the ecosystem of a 
cell to allow us to deduce the mech- 
anism of action and target for any 
chemical," explains Bruce Cohen, 
president and CEO. "We screen for 
every target in a cell sinrnltanooaH- 
ly...ustng traiLscription as a readout 



for how a cell is adapting to any 
perturbation," he says. 

The GRM technology consists of 
two main dauibases: one is the 
genetic response profile, showing 
the effecb of mutatioas in each 
individual yeast gene and compen- 
satory gene regulatory mecha- 
nisms; the other is the chemical 
response profile. whk:h documents 
changes in gene expression in 
rc^xmsc to chemical compounds. 
Computational analysis and pattern 
matching between the geiietic and 
chemical profiles yields informa- 
tion on the specificity, potency and 
side-effects risk of a drug lead 

Targeting Tarisets 

No longer is mapping and 
sequencing a gene — or the human 
genome — on ctkI unto itself, but 
SEfiTAmET,P.18 



Sticky Ends 



Avlgen received two 
grants from the NIK & 
XJnlverslty of Cali- 
fornia for reoearch 
on gene therapy for 
treatment of cancer & 
HIV infections... .KRI. 
Pharmaceutical Sarvi- 
ces, of Reston, VA« 
launched the TSM Bug 
Finder, which io able 
to locate 6 retrieve 
client -specified mi- 
croorganisms in real- 
time . . . Oensla Sloor, 
Inc. will move its 
corporate staff from 
San Diego to Irvine, 
CA. by end of year. . . 



PDA accepted NDA from 
Sapracor for levalbu- 
terol HCl Inhalation 
solution . . . An $11 . 7M 
mezzanine financing 
has been closed by 
Activated Cell Thera- 
py < which changed its 
nanve to Dendreon Cor- 
poration . . . Astra AB 
will build major re- 
search facility in 
Haltham. MA, and la 
also relocating Astra 
Arous research facil- 
ity from Rochester to 
Boston area. . .Prollf- 
ix Ltd. team used a 
aroall peptide to in- 
hibit the E2F protein 
complex and induced 



apoptoais in mammali- 
an tumor cells... Var- 
tax Phamaoeutioals , 
Inc. and Alpha Thsra- 
pautic Corp. ended an 
agreement to develop 
VX-366 for treatment 
of Inherited hemoglo- 
bin diaordera. . .Havl- 
Cyte received Phase 1 
SBIR grant for up to 
$100,000 from MIH for 
development of proto- 
type of ice MavlFlow 
technology for hlgh- 
throxighput screening 
. . .Covanoe Inc. will 
invest $21 million in 
expansion and renova- 
tion of its facility 
in Indiamapolta, IN. 
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merely a means in an end. The criti- 
cal next step is to \nlidatc the 
and its protcinproduct as a potential 
drug taiget. Ihe Human Genome 
Project continues lo produce a trea- 
sure chest of expressed sequence 
tags ( ESTs) and a tantalizing array of 
complete gene sequences. 

Companies are applying a variety 
of functional genomic strategies to 
link genes to specif ^c dwcascs and to 
muliigcnic phcnotypcs. Yet the ulti- 
mate challenge for pharmaceutical 
companies Ls to sifl through all the 
sequence and differential gene 
expression data to identify the best 
targets for drug discm'cry. 

Spinning off technology devel- 
oped at the University of North 
Carolina (Chapel Hill), Cytogen 
Corp. (Princeton, NX) formed its 
wholly owned subsidiary AxCell 
Bloscleiices earlier this year. The 
young company is building a protein 
interaction database, catalc^ng all 
the Interactions the modular domains 
of proteins can engage in with a 



range of ligands, in order to gain 
Insight into pioietn function and to 
select the most critical interaction lo 
target for drug development. 

AxCdl^ cloning-oMicand-taiBeLt 
(COLT) technology cmpUys "recog- 
nition units" from the company^ 
genetic dhmsity library (GDL) to 
map functional pitrtcin interactions 
and quantitate their afTmity. The 
eompany>i Inter-functional protcom- 
tc database (IFP-dbasc) elucidates 
protein interaction networks and 
structure-activity relationships based 
on Itgand affinity with protein mod* 
ular domains. 

Defini ng Disease Pathways 

Signal Pharmaceatteftls, Inc^ 

(San Diego, CA) integrated drvg tar- 
get aixl discovery effort is based on 
mapping gene-regulating pathways in 
cells and identifying small molecules 
that regutote the activation of those 
genes. In collaboration with academ- 
ic researchers, the conrpariy has iden- 
tificd a taigc number of regulatory 
proteins in several mitogcn-activated 
protein (MAP) kinase pathways 
(including the JNK, FRK and p38 



B 
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signaling pathways^ which Signal is 
evaluating for the treatment of 
autoimmune, inflammatory. canik>- 
vascularand neurologic diseases, and 
cancer. Other target identiHcation 



programs focus on the NF-kB path- 
way, estiogen-fclated genes and ccn- 
tral^ieripheral nervous system genes. 

Regulating cytokine production in 
immune and inflammatory disorders. 




A strong chemical combination to help you grow. And flourish. 

Tliree hundred million dollare and ten \^ of hard wrk. Thm s what it costs to bring j-oiir blotechnologv'- 
iLTiml thcr:i|X'utic to the markcl|il:ia'. 
Which means, no room for error. 

Vfhich means, in turn, )-ou*d be wise to lap into the combined capabilities of Mnllinckrodt and J.T.Baker 
dual sources, trusted names for yoiir chemical raw materials. 

TV^o ,<;eparate GMP-produced brands offering the control of a sinRle qualit\' ss-stem and the convenience of a 
sinuli! audit pnjcess. 

We offer compithensiv-e product lines including irSP salts, bioreagents. high purity sohients and 
ciMiaiography products in Beaker to Bulk"* packaging for easy scale-up. 

(;ill l-KO()-S«2-2S37, or access our u-ehsitc at httpyA\'*w.maitbakcr.com. For dual chemical sources dedkxiled 
to hclpinn you grow. Flourish. Sucaed! 
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and modifying bone metabolism to 
treat ostcf^xirosis arc the focus of 
Signal^ collaboration with Tanabe 
Setyaku (Osaka. Japan). Signal has 
paiincrcd with Organon/Akzo 
Nobel (Ncihef lands) to identify 
cstrogcn-rcsponsive genes as targets 
for treating neurodegeneiative and 
psychiatric diseases, athcrosclcrcwis 
and ischemia, and with Rocht 
BkxKienee (Palo Alio. CA) to devel- 
op human pcriphcrnl nerve cell lines 
fw ihc discovery of treatments for 
pain and incontinence. 

Exdlxls' (S. San FrancKCO. CA) 
strategy for target select km is to 
define drsca.% pathways and identity 
regulatory molecules that activate or 
inhibit those biochemical/genetic 
pathways. Based on the finding that 
these pathways are conserved across 
5fKcics, Che company is studying the 
rnodel genetic systems of Drosophila 
and Caenorhabditit elegans. Using 
its Pathnndcr technology, Exelixis 
systematically introduces mutatkms 
into the genomes of these model 
organisms, looking for mutations 
that enhance or suppress die target 
disease-related gene. These novel 
genes then become ttie basis of drug 
screening assays. 

Cadns Pharmaceutical Corp. 
(Tairytown, NY) is identifying sur- 
rogate ligands to newly discovered 
orphan G-proiein coupled trams- 
membrane receptors of unknown 
function to determine the suitability 
of the nxcptots as drug targets. 
Inserting the tkivcI receptor in a 
yeasX system yiekl« a ligand thai 
activates the receptor. Access to a 
surrogate ligand allows the company 
to .screen for receptor antagonists in 
the yeast system. 

"The antagonist plus the surro- 
gate ligand gives you two prober — 
an on probe and an off probe — 
which allows you to knk at fimc- 
lion." explains David Webb. Ph.D., 
vp of research and chief scientific 
ofTiccr. A surrogate ligand also pro- 
vides information on which G-pro* 
tein interacts with the orphan recep- 
tor and its associated signaling path- 
ways, further clarifying the role of 
the receptor as a potential drug tar- 
get. Cadus* collat»ration with 
SmithKliiie (Philadelphia) capital- 
tzcs on Cadus' ability to determine 
orphan receptor function, applying 
the technology to SmithKlineVt pro- 
prietary, newly discovered G-pro- 
tetn receptOTS. 

Cadus' recombinant yeasi system 
can also be used to screen cell aiKl 
tissue extracts for nahual ligarxK 
ami Ihc ctmipany is acccterating its 
internal drug-discovery efforts in the 
areas of cancer, inflammation and 
allergy A recent equity invcstmeitt in 
Axiom Btotechnoioglcs (San Diego, 
CA) gave Cadas a license lo Axiom^ 
high-throughput pharmacologic 
screening system for lead optimiza- 
tion and discovery. 

As its name implies, 
gene/Networfcs (Alameda., CA) 
focuses on identifying gene networks 
that comributc lo multigcnic phcno- 
typcs and complex disease process- 
es. The integration of mouse and 
human genetic studies forms the 
basis of the technology. The Genome 
Tagged Mice database in dcvetop- 
mcni will serve as a library of natur- 
al mouse genetic and phenotypk; 
variatkin. Disease-related genes 
identified in mice are then evaluated 
in human family- and pcpylation- 
t»ascd studies to confirm their clini- 
cal rclcN-ancc and linkages to patho- 
physiologic iiniUu 

BkKking Gene Expression 

Inactivating a gene known lo be 
cxprcs.vcd in :t.'o*x-iaiion with a par- 
ticular disca.sc is one approach lo 
identifying appropriate therapeutic 
laigcls. The target x-alidaiion and dist- 
cmcry program at Rtbozymc 
Pharmaceuticals. Inc. (Boulder. 
CO) applies the company:-! ribozymc 
tcchmiJi^y lo :M:hic\v selective inhi- 
bition ol'gaic exprvssiiMi in cell cul- 
turv and in aniniuls. 

Corrclatiim of ihc gene cxpms- 
sion inhibitttm with phcmilypo can 
SEE TARGET. P. 38 
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suggest the relative importance of 
the gene in disease pathology. The 
company^ nuclease-resistant 
ribo^mcs fonn the basis of a col- 
laboration with Scbering AG 
(Germaiiy) for dnig taigct validation 
and the development of ribozyme- 
based therapeutic agents, and with 
ChiroD Corp. (Enncryville, CA) for 
tai^et validatioa 

With seveial antisense compounds 
now progressing through clinical tri- 
als« the concept of using olt§ont^ 
clcotides to inhibit gene activity is 
not new. But rather than focusir^ on 
therapeutics development, SeqiiiCiu; 
Inc. (Natick. MA) is creating anti- 
sense compounds for the purpose of 
determining gene function and vali- 
dating drug taigets. Clients typically 
provide the one-year-old company 
with the sequence (or EST) of a 
potential gene target and, in return, 
Sequitur custom designs a series of 
three to six antisense compounds that 
yield a three-to4en-fold inhibition of 
the target gene in cell culture. The 
company also provides oligofectins, 
a series of cationic lipids, to deliver 
the oligonucleotides to a variety of 
cultured cells. 

"Differential expression informa- 
tion is just for correlation, it doesn*t 
tell function or confimi what xwould 
be a good targeC* says Tod Woolf, 
PIlD., director of technology devel- 
qMTient at Sequitur. Whereas, anti- 
sense compounds will inhibit a tar- 
get Sequitur offers both phosf^io- 
rothioate DNA antisense com- 
pounds, and its proprietary Next 
Generation chimeric oligonu- 
cleotides, which have a higher 
hybridization affinity, greater spcci- 
flcity and reduced toxicity, according 
to the company. , 

Mining Pathogen Genomes 

Companies such as Humaa 
Genome Sciences (HGS; Rockvillc. 
MD), iDcyte (Palo Alto. CA), 




AxCell Biosciences sciendsts say their technoiogy enables the rapid and 
simpie /imctional idenHftcation ^ the two essential molecular conuxments 
of protein interaction netwotics: specific recognition units that bind distinct 
modular protein domains are identified tmd isolated using a combination 
structurul^fimctional approach that uses both peptide phase display Genetic 
Diversity Libraries (GDI) and bioinjbrmatics, and doning of Ligand 
Targets (COLT) technology utilizes recognition units as Junctional probes to 
isolate families cf tnteructor proteins. 



MiUcnnlum Pharmacentkab Inc. 
(Cambridge, MA) and Genome 
Therapeutics (Waltham. MA) are 
relying on high-speed DNA sequenc- 
ing, positional doning and other 
strategies to identify specific nucro- 
btol genomic sites thai \M>uld be 
good targets for infectious disease 
therapeutics. 

HGS recently completed sequenc- 
ing of the bacterial pathogen 
Streptococcus pnewrumiae, whidi is 
the focus of an agreement with 
Hoffmann-La Roche (Basel, 
Switzerland). Roche will use the 
sequence data to develop new anti- 
infectives against S. pneumoniae, 
HGS and Roche have expanded their 
collaboration to include a nonexclu- 
sive license to access sequence infor- 
mation for the intestinal bacterium 
Enterococcus faecalis. 

IntMe Pharmaceuticals has com- 
pleted one-fold coverage of the 
Candida albicans genome, identify- 



ing 60% of the genes of this fungal 
pathogen. This genome will become 
part of the company^ PathoSeq 
microbial database, hicyte recently 
introduced the ZooSeq animal gene 
sequence and expression database. 
The database will provide genomic 
information across various species 
conimonb' used in ppeclinical drug 
testing, which may help to better 
define poteruial dni^ targets. 

Millennium Pharmaceuticals con- 
tinues to report success in identifying 
novel drug targets, having recently 
discovered a novel chemoldne called 
neurotactin and a new class of MAD- 
relaied proteins that inhibit trans- 
forming growth factor beta (TGF-fl) 
signaling. The company also 
received US. patem coverage for the 
tub genes, believed to play a role in 
obesity, and for the gene that encodes 
the protein melastatin, which appears 
to suppress metastasis in malignant 
melanoma. 



HIGH SPECIFIC ACTIVITY 
MICROBIAL ALKALINE 
PHOSPHATASE 
from Blocatalysts 

Blocatalysts Limited, the British speciality enzyme 
company, has developed a completely new type of 
alkaline phosphatase with many advantages over the 
types most comnfionly used, 
tt is of microbial ortgin v4th a high specific activity 
(unfike that from E coli) and with higher tonperature and 
storage stability compared to that from calf Intestine. 
This is the first of several new generation diagnostic 
enzymes being developed by Blocatalysts Limited with 
greatly improved stability. 

• Non-animal source, oo risk of BSE oi aolmal 
virus contamination 

• Higher temperature stability than calf Intestine 

• Much higher specifle activity than from E. coU 

• Vbry high storage stability even in the absence 
of glycerol 

for futther details on aiKaHite phosphat^e andourotlter 
diagnostic etaytnes contact us diroct at the address bdow or 
wiffiin Nonh America contact our US Distributor Kalttx)n-Pettibone 
'plwr)e: 830 350 1116 or fax: 630^1606 

Blocatatrtts Umtted 

Trafornt tedtntrlal Estate Pontypridd Wain UK CF37 SUP 
Tel: +44 (0)1443 843712 Fas 4-44 (0)1443 B41214 
a-iBaa-ltelty@8iocaUlyttaxQa. 
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Smith, now a computer program- 
mer, is an expert in systons integra- 
tion, Internet technologies and the 
af^ilication of industrial enpnecring 
pnnctples to the drug disccvery 
process. Bcfofc co-fbunoing Pangea, 
be was the manager of sofwiarc 
development at Attorney^ Briefcase, 
a legal research software company. 

By being "in the trenches" with 
customers and collaborators, 
Bellenscm and Smith sensed the 
frustration of pharmaceutical 
researchers whose incompatible 
tools have inqxded their progress. 
According to Bellenson. **Mo5t of 
them are geared toward analyzing 
one molecule at a time. It^ like emp- 
tying the ocean with an eye drop- 
per—on tncompanl)Ie eye dropper at 
that. A pharmaceutical company 
may have 30 different drug discov- 
ery teams with various approaches. 
The problem is to manage the 
process of experimenting with a lot 
of different approaches, to automate 
while maintaining flexibility.** 

GeneWorid 2.1 enables '^integra- 
tion of the entire target discovery and 
validation process,** Bellcnson says. 
The commercial software package 
coordinates the entire process of 
sequence^iata analysts mtd can be 
integrated with other, programs and 
databases, according to Smith, who 
adds that it handles thousands of 
sequence results, organizes and auto- 
mates annotation and seamlessly 
interacts with growing genome data- 
bases. Simple forms and menus 
enable users to turn , raw sequence 
data into crucial knowledge for drug 
discovery by applying algorithms to 
sequences, creatiiig custom analysis 
strategies and producing useful 
reports, without the need for writing 
computer code. GeneWorid 2.1 runs 
on a variety of platforms and operat- 
ing systems. 

Pairing industrial relational data- 
base-nianagcment systems with a 
web-browser interface, Pangea 's . 
Operating System of Drug 
Discovery" is an open-computing 
frameworic that allows client/server 
and Java-enabled web4>ased tech- 
nokigics to collect, organize and ana- 
lyze drug discovery information for 
pharmaceutical companies to simpli- 
fy and accelerate <faiigdisooveiy. The 
technology unites automated 
genomks database analysis for drug 
target site selection, chemical infor- 
mation database analysis and large- 
scale conitnnatorial chemistry pro- 
ject management and high-thiou^ 
put screening project management 
for drug lead efficacy analysis. 
Pangea officials maintain that these 
integrated elements provide a unified 
environment for chemists, biologists 
and odiers involved in the drug dis- 
covery process to work together wi A 
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commercial and public domain 
sofb;rare. 

Pangea^ Operating System of 
Drug Discovery can accommodate 
Syfo^, Oracle or Informix relation- 
al database-managentent systems 
and any version of UNIX. It absoilis 
new data formats, databases, algo- 
rithms and analysis paradigms into 
the automated workflow without 
software modifications. Netscape 
Navigator" provides a friendly user 
interface from PC, Macintosh, and 
UNDC workstations. 

In the near term, Pangea plans to 
complete its btoinformatics core 
with two more programs. Gene 
Foundry, a sample tracking and 
workflow sequence package for 
DNA sequence and fragment infor- 
mation, will also offer interaction 
with robots, reagent tracking and 
troubleshooting. Gene Thesaurijs. 
the other package is a *^varehouse 
of bioinformatics data,** says 
Bellcnson. ■ 
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GTAC Chairman, Professor 
Nomnan C. Nevia said 1996 saw 
"four important devekipments**: an 
increase in enquiries and submis- 
sions made to GTAC; an increase in 
the complexity of submitted proto- 
cols; a continuing shift from gene 
therapy for single-gene disorders 
toward strategics aimed at tumour 
destruction in cancer; arxl a growth 
in international sponsorship of UK. 
gene therapy trials. 

Since 1993. GTAC and its prede- 
cessor, the Clodiicr Committee, have 
appoved 18 LUC gene therapy clini- 
cs trials (13 of windi have been car- 
ried out), which are listed in the 
report The disease areas taiKetcd ly 
these trials inchide severe combined 
immunodeficiency (I trial), cysdc 
fibrosis (6X metastatic melanoma (2), 
lymphoma (2X neuroUastocna (1), 
breast cancer (1), Hurier^ syndrome 
( I V cervical cancer ( 1 ). glioblastoma 



breast cancer, breast cancer with liver 
metastases, gUoblastoma, malignant 
ascites due to ga^intestinal cancer 
and ovarian canca. 

Copies of the GTAC thrid annual 
report are available from the GTAC 
Secretariat, Wellington House, 133- 
ISS >V^oo Road, London SEI 
8UG, UK. 

Coated Lenses Prevent PCO 

Scientists in the UK. say it may be 
possible to prevent postenor capsule 
opacificatun (PCO), a common 
compUcadon following cataract 
surgery, by using the implanted poly- 
methylmclhacrylaic (PMMA) 
intraocular lens as a drug delivery 
system. POO oocure in 30-50% of 
cataract surgery padents as a result of 
stimulated cell growth within the 
remaining capsular bog. The condi- 
tion causes a decline in visual acuity 
and requires ocpensive laser tieal- 
mem. thus negating the routine use of 
cataract surgery in underdeveloped 
countries, explains G. Duncan, at the 
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Exploring the Metabolic and Genetic Control of 
Gene Expression on a Genomic Scale 

Joseph L. DeRisi, Vishwanath R. Iyer, Patrick O. Brown* 

DNA microarrays containing virtually every gene of Saccharomyces cerevisiae were used 
to carry out a comprehensive investigation of the temporal program of gene expression 
accompanying the metabolic shift from fermentation to respiration. The expression 
profiles observed for genes with known metabolic functions pointed to features of the 
metabolic reprogramming that occur during the diauxic shift, and the expression patterns 
of many previously uncharacterized genes provided clues to their possible functions. The 
same DNA microarrayis were also used to identify genes whose expression was affected 
by deletion of the transcriptional cp-repressor TUP1 or overexpression of the transcrip- 
tional activator YAP1. These results demonstrate the feasibility and utility of this ap- 
proach to genomewide exploration of gene expression patterns. 



The complete sequences of nearly a dozen 
microbial genomes are known, and in the 
next several years we expect to know the 
complete genome sequences of several 
metazoans, including the human genome. 
Defining the role of each gene in these 
genomes will be a formidable task, and un- 
derstanding how the genome functioris as a 
whole in the complex natural history of a 
living organism presents an even greater 
challenge. 

Knowing when and where a gene is 
expressed often provides a strong clue as to 
its biological role. Conversely, the pattern 
of genes expressed in a cell can provide 
detailed information about its state. Al- 
though regulation of protein abundance in 
a cell is by no means accomplished solely 
by regulation of mRNA, virtually all dif- 
ferences in cell type or state are correlated 
with changes in the mRNA levels of many 
genes. This is fortuitous because the only 
specific reagent required to measure the 
abundance of the mRNA for a specific 
gene is a cDNA sequence. DNA microar- 
rays, consisting of thousands of individual 
gene sequences printed in a high-density 
array on a glass microscope slide (1, 2), 
provide a practical and economical tool 
for studying gene expression on a very 
large scale (3-6). 

Saccharomyces cerevisiae is an especially 

Department of Biochemistry, Stanford University School 
of Medicine. Howard Hughes Medical Institute. Stanford. 
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favorable organism in which to conduct a 
systematic investigation of gene expression. 
The genes are easy to recognize in the ge- 
nome sequence, cis regulatory elements are 
generally compact and close to the tran- 
scription units, much is already known 
about its genetic regulatory mechanisms, 
and a powerful set of tools is available for its 
analysis. 

A recurring cycle in the natural history 
of yeast involves a shift from anaerobic 
(fermentation) to aerobic (respiration) me- 
tabolism. Inoculation of yeast into a medi- 
um rich in sugar is followed by rapid growth 
fueled by fermentation, with the production 
of ethanol. When the fermentable sugar is 
exhausted, the yeast cells turn to ethanol as 
a carbon source for aerobic growth. This 
switch from anaerobic growth to aerobic 
respiration upon depletion of glucose, re- 
ferred to as the diauxic shift, is correlated 
with widespread changes in the expression 
of genes involved in fundamental cellular 
processes such as carbon metabolism, pro- 
tein synthesis, and carbohydrate storage 
(7). We used DNA microarrays to charac- 
terize the changes in gene expression that 
take place during this process for nearly the 
entire genome, and to investigate the ge- 
netic circuitry that regulates and executes 
this program. 

Yeast open reading frames (ORFs) were 
amplified by the polymerase chain reaction 
(PCR), with a commercially available set of 
primer pairs (8). DNA microarrays, con- 
taining approximately 6400 distinct DNA 
sequences, were printed onto glass slides by 



using a simple robotic printing device (9). 
Cells from an exponentially growing culture 
of yeast were inoculated into fresh medium 
and grown at 30°C for 21 hours. After an 
initial 9 hours of growth, samples were har- 
vested at seven successive 2-hour intervals, 
and mRNA was isolated (10). Fluorescently 
labeled cDN A was prepared by reverse tran- 
scription in the presence of Cy3 (green) - 
or Cy5(red)-labeled deoxyuridine triphos- 
phate (dUTP) (II) and then hybridized to 
the microarrays {12}. To maximize the re- 
liability with which changes in expression 
levels could be discerned, we labeled cDNA 
prepared from cells at each successive time 
point with Cy5, then mixed it with a Cy3- 
labeled "reference" cDNA sample prepared 
from cells harvested at the first interval 
after inoculation. In this experimental de- 
sign, the relative fluorescence intensity 
measured for the Cy3 and Cy5 fluors at 
each array element provides a reliable mea- 
sure of the relative abundance of the corre- 
sponding mRNA in the two cell popula- 
tions (Fig. 1). Data from the series of seven 
samples (Fig. 2), consisting of more than 
43,000 expression- ratio . measurements, 
were organized into a database to facilitate 
efficient exploration and analysis of the 
results. This database is publicly available 
on the Internet (J 3). 

During exponential growth in glucose- 
rich medium, the global pattern of gene 
expression was remarkably stable. Indeed, 
when gene expression patterns between the 
first two cell samples (harvested at a 2-hour 
interval) were compared, mRNA levels dif- 
fered by a factor of 2 or more for only 19 
genes (0.3%), and the largest of these dif- 
ferences was only 2.7-fold ( J4). However, as 
glucose was progressively depleted from the 
growth media during the course of the ex- 
periment, a marked change was seen in the 
global pattern of gene expression. mRNA 
levels for approximately 710 genes were 
induced by a factor of at least 2, and the 
mRNA levels for approximately 1030 genes 
declined by a factor of at least 2. Messenger 
RNA levels for 183 genes increased by a 
factor of at least 4, and mRNA levels for 
203 genes diminished by a factor of at least 
4- About half of these differentially ex- 
pressed genes have no currently recognized 
function and are not yet named. Indeed, 
more than 400 of the differentially ex- 
pressed genes have no apparent homology 
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to any gene whose function is known {15). 
The responses of these previously unchar- 
acterized genes to the diauxic shift therefore 
provides the first small clue to their possible 
roles. 

The global view of changes in expres- 
sion of genes with known functions pro- 
vides a vivid picture of the way in which 
the cell adapts to a changing environ- 
ment. Figure 3 shows a portion of the yeast 
metabolic pathways involved in carbon 
and energy metabolism. Mapping the 
changes we observed in the mRNAs en- 
coding each enzyme onto this framework 
allowed us to infer the redirection in the 
flow of metabolites through this system. 
We observed large inductions of the genes 
coding for the enzymes aldehyde dehydro- 
genase .(ALD2) and acetyl-coenzyme 
A(CoA) synthase (ACS J), which func- 
tion together to convert the products of 
alcohol dehydrogeriase into acetyl-CoA, 
which in turn is used to fuel the tricarbox- 
ylic acid (TCA) cycle and the glyoxylate 
cycle. The concomitant shutdown of tran- 
scription of the genes encoding pyruvate 
decarboxylase and induction of pyruvate 
carboxylase rechannels pyruvate away 
from acetaldehyde, and instead to oxalac- 
etate, where it can serve to supply the 
TCA cycle and gluconeogenesis. Indue- , 
tion of the pivotal genes PCKl , encoding 
phosphoenolpyruvate carboxykinase, and 
FBPly encoding fructose 1,6-biphos- 
phatase, switches the directions of two key 
irreversible steps in glycolysis, reversing 
the flow of metabolites along the revers- 
ible steps of the glycolytic pathway toward 
the essential biosynthetic precursor, glu- 
cose- 6-phosphate. Induction of the genes 
coding for the trehalose synthase and gly- 
cogen synthase complexes promotes chan- 
neling of glucose-6-phosphate into these 
carbohydrate storage pathways. 

Just as the changes in expression of 
genes encoding pivotal enzymes can pro- 
vide insight into metabolic reprogram- 
ming, the behavior of large groups of func- 
tionally related genes can provide a broad 
view of the systematic way in which the 
yeast cell adapts to a changing environ- 
ment (Fig. 4). Several classes of genes, 
such as cytochrome c-related genes and 
those involved in the TCA/glyoxylate cy- 
cle and carbohydrate storage, were coordi- 
nately induced by glucose exhaustion. In 
contrast, genes devoted to protein synthe- 
sis, including ribosomal proteins, tRNA 
synthetases, and translation, elongation, 
and initiation factors, exhibited a coordi- 
nated decrease in expression. More than 
95% of ribosomal genes showed at least 
twofold decreases in expression during the 
diauxic shift (Fig. 4) (J3). A noteworthy 
and illuminating exception was that the 



genes encoding mitochondrial ribosomal 
genes were generally induced rather than 
repressed after glucose limitation, high- 
lighting the requirement for mitchondrial 
biogenesis (13). As more is learned about 
the functions of every gene in the yeast 
genome, the ability to gain insight into a 
cell's response to a changing environment 
through its global gene expression patterns 
will become increasingly powerful. 

Several distinct temporal patterns of ex- 
pression could be recognized, and sets of 
genes could be grouped on the basis of the 
similarities in their expression patterns. The 
characterized members of each of these 
groups also shared important similarities in 
their functions. Moreover, in most cases, 
common regulatory mechanisms could be 
inferred for sets of genes with similar expres- 
sion profiles. For example, seven genes 
showed a late induction profile, with mRNA 
leveb increasing by more than ninefold at 



the last timepoint but less than threefold at 
the preceding timepoint (Fig. 5B). All of 
these genes were known to be glucose-re- 
pressed, and five of the seven were previously 
noted to share a common upstream activat- 
ing sequence (UAS), the carbon source re- 
sponse element (CSRE) (16-20). A search 
in the promoter regions of the remaining two 
genes. ACRl and JDP2, revealed that 
ACRlt a gene essential for ACS! activity, 
also possessed a conserisus CSRE motif, but 
interestingly, IDP2 did not. A search of the 
entire yeast genome sequence for the con- 
sensus CSRE motif revealed only four addi- 
tional candidate genes, none of which 
showed a similar induction. 

Examples from additional groups of 
genes that shared expression profiles are 
illustrated in Fig. 5, C through F. The 
sequences upstream of the named genes in 
Fig. 5C all contain stress response ele- 
ments (STRE), and with the exception 



FIPL4B ^ t 



YPL142, 




Fig. 1. Yeast genome microarray. The actual size of the microarray is 18 mm by 18 mm. The 
microarray was printed as described (9). This image was obtained with the same fluorescent 
scanning confocal microscope used to collect all the data we report [49), A fluorescently labeled 
cDNA probe was prepared from mRNA isolated from cells harvested shortly after inoculation (culture 
density of <5 x 10^ cells/ml and media glucose level of 19 g/liter) by reverse transcription in the 
presence of Cy3-dUTP. Similarly, a second probe was prepared from mRNA isolated from cells taken 
from the same culture 9.5 hours later (culture density of —2 x 10® cells/ml. with a glucose level of 
<0.2 g/liter) by reverse transcription in the presence of Cy5-dUTP. In this image, hybridization of the 
Cy3-dL/TP-labeled cDNA (that is, mRNA expression at the Initial timepoint) is represented as a green 
signal, and hybridization of Cy5-dUTP-labeled cDNA (that is, mRNA expression at 9.5 hours) is 
represented as a red signal. Thus, genes induced or repressed after the diauxic shift appear in this 
image as red and green spots, respectively. Genes expressed at roughly equal levels before and after 
the diauxic shift appear in this image as yellow spots. 
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oi HSP42f have previously been shown to 
be controlled at least in part by these 
elements {21-24). Inspection of the se- 
quences upstream of HSP42 and the two 
uncharacterized genes shown in Fig. 5C, 
YKL026c, a hypothetical protein with 
similarity to glutathione peroxidase, and 
YGR043c, a putative transaldolase, re- 
vealed that each of these genes also pos- 
sess repeated upstream copies of the stress- 
responsive CCCCT motif. Of the 13 ad- 
ditional genes in the yeast genome that 
shared this expression profile (including 
HSP30, ALD2, OM45, and 10 uncharac- 
terized ORFs (25)1, nine contained one or 
more recognizable STRE sites in their up- 
stream regions. 

The heterotrimeric transcriptional acti- 
vator complex HAP2,3A has been shown 
to be responsible for induction of several 
genes important for respiration (26-28). 
This complex binds a degenerate consensus 
sequence known as the CCAAT box (26). 
Computer analysis, using the consensus se- 
quence TNRYTGGB (29). has suggested 
that a large number of genes involved in 
respiration may be specific targets of 
HAP2,3A : (30). Indeed, a putative 
HAP2,3,4 binding site could be found in 
the sequences upstrearn of each of the seven 
cytochrome c-related genes that showed 
the greatest magnitude of induction (Fig. 
5D). Of 12 additional cytochrome c-related 
genes that were induced, HAP2,3,4 binding 
sites were present in all but one. Signifi- 
cantly, we found that transcription of 
fiAP4 itself was induced nearly ninefold 
concomitant with the diaiixic shift. 

Control of ribosomal protein biogenesis 
is mainly exerted at the transcriptional 
level, through the presence of a common 
upstream-activating element (UAS^pg) 



that is recognized by the Rapl DNA-1 
ing protein (3J, 32). The expression pro- 
files of seven ribosomal proteins are shown 
in Fig. 5F. A search of the sequences 
upstream of all seven genes revealed con- 
sensus Rapl -binding motifs (33), It has 
been suggested that declining Rapl levels 
in the cell during starvation may be re- 
sporisible for the decline in ribosomal pro- 
tein gene expression (34). Indeed, we ob- 
served that the abundance of RAP] 
mRNA diminished by 4.4-fold, at about 
the time of glucose exhaustion. 

Of the 149 genes that encode known or 
putative transcription factors, only two, 
HAP4 and SJP4, were induced by a factor of 
more than threefold at the diauxic shift. 
SIP4 encodes a DNA-binding transcrip- 
tional activator chat has been shown to 
interact with Snfl, the "master regulator" of 
glucose repression (35). The eightfold in- 
duction of SIP4 upon depletion of glucose 
strongly suggests a role in the induction of 



downstream genes at the diauxic shift. 

Although most of the transcriptional 
responses that we observed were not pre- 
viously known, the responses of many 
genes during the diauxic shift have been 
described. Comparison of the results we 
obtained by DNA microarray hybridiza- 
tion with previously reported results there- 
fore provided a strong test of the sensitiv- 
ity and accuracy of this approach. The 
expression patterns we observed for previ- 
ously characterized genes showed almost 
perfect concordance with previously pub- 
lished results (36). Moreover, the differ- 
ential expression measurements obtained 
by DNA microarray hybridization were re- 
producible in duplicate experiments. For ■ 
example, the remarkable changes in gene 
expression between cells harvested imme- 
diately after inoculation and immediately 
after the diauxic shift (the first and sixth 
intervals in this time series) were mea- 
sured in duplicate, independent DNA mi- 
croarray hybridizations. The correlation 
coefficient for two complete sets of expres- 
sion ratio measurements was 0.87, and for 
more than 95% of the genes, the expres- 



sion ratios measured in these duplicate 
experiments differed by less than a factor 
of 2. However, in a few cases, there were 
discrepancies between our results and pre- 
vious results, pointing to technical limita- 
tions that will need to be addressed as 
DNA microarray technology advances 
(37, 38). Despite the noted exceptions, 
the high concordance between the results 
we obtained in these experiments and 
those of previous studies provides confi- 
dence in the reliability and . thoroughness 
of the survey. 

The changes in gene expression during 
this diauxic shift are complex and involve 
integration of many kinds of information 
about the nutritional and metabolic state 
of the cell. The large number of genes 
whose expression is altered and the diver- 
sity of temporal expression profiles ob- 
served in this experiment highlight the 
challenge of understanding the underlying 
regulatory mechanisms. One approach to 
defining the contributions of individual 
regulatory genes to a complex program of 
this kind is to use DNA, microarrays to 
identify genes whose expression is affected 



Rg. 2. The section of the ar- 
ray indicated by the gray box 
In Rg. 1 is shown for each of 
the experiments desqibed 
here. Representative genes 
. are labeled. In each of the ar- 
rays used to analyze gene 
expression during the diauxic 
shift, red spots represent 
genes that were induced rel- 
ative to the initial timepoint, 
arxJ green spots represent 
genes that were repressed 
relative to the initial timepoint. 
In the arrays used to analyze . 
the effects of the ft7p7 A mu- 
tation and YAP1 overexpres- 
sion, red spots represent 
genes whose expressbn was 
increased, and green spots 
represent genes whose ex- 
pression was decreased by 
the genetic modificatbn. Note 
that distinct sets of genes are 
induced and repressed in the 
different experiments. The 
complete images of each of 
these arrays can be viewed on 
the Internet (73). Cell density 
as measured by optical densi- 
ty (OD) at 600 nm was used to 
measure the growth of the 
culture. 
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by mutations in each putative regulatory 
gene. As a test of this strategy, we analyzed 
the genomewide changes in gene expression 
that result from deletion of the TUPl gene. 
Transcriptional repression of many genes by 
glucose requires the DNA-binding repressor 



Migl and is mediated by recruiting the tran- 
scriptional co-repressors Tup I and tCycS/ 
Ssn6 (39). Tupl has also been implicated in 
repression of oxygen-regulated, mating-type- 
specific, and DNA-damage-inducible genes 
(40). 
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Fig. 3. Metabolic reprogramming inferred from global analysis of changes in gene expression. Only key 
metabolic intermediates are identified. The yeast genes encoding the enzymes that catalyze each step 
in this metabolic circuit are identified by name in the boxes. The genes encoding succinyl-CoA synthase 
and gtycogen-debranching enzyme have not been explicitly identified, but the ORFs YGR244 and 
YPR184 show significant homology to known succinyl-CoA synthase and glycogen-debranching en- 
zymes, respectively, and are therefore included in the corresponding steps in this figure. Red boxes with 
white lettering identify genes whose expression increases in the diauxic shift. Green boxes with dark 
green lettering identify genes whose expression diminishes in thie diauxic shift. The magnitude of 
induction or repression is indicated for these genes. For multimeric enzyme complexes, such as 
succinate dehydrogenase, the indicated fold-induction represents an unweighted average of ail the 
genes listed in the box. Black and white boxes indicate no significant differential expression (less than 
twofold). The direction of the arrows connecting reversible enzymatic steps indicate the direction of the 
flow of metat>olic intemiediates, inferred from the gene expression pattern, after the diauxic shift. An-ows 
representing steps catalyzed by genes whose expression was strongly induced are highlighted in red. 
The broad gray arrows represent major increases in the flow of metabolites after the diauxic shift, 
infen-ed from the indicated changes in gene expression. 



Wild-type yeast cells and cells bearing 
a deletion of the TUP J gene (tupl^) were 
grown in parallel cultures in rich medium 
containing glucose as the carbon source. 
Messenger RNA was isolated from expo- 
nentially growing cells from the two pop- 
ulations and used to prepare cDNA la- 
beled with Cy3 (green) and Cy5 (red), 
respectively ill). The labeled probes were 
mixed and simultaneously hybridized to 
the microarray. Red spots on the microar- 
ray therefore represented genes whose 
transcription was induced in the tup] A 
strain, and thus presumably repressed by 
Tupl (41). A representative section of the 
microarray (Fig. 2, bottom middle panel) 
illustrates that the genes whose expression 
was affected by the tup J A mutation, were, 
in general, distinct from those induced 
upon glucose exhaustion [complete images 
of all the arrays shown in Fig. 2 are avail- 
able on the Internet (13)]. Nevertheless, 
34 (10%) of the genes that were induced 
by a factor of at least 2 after the diauxic 
shift were similarly induced by deletion of 
TUP J , suggesting that these genes may be 
subject to TUP J -mediated repression by 
glucose. For example, SLJCIt the gene en- 
coding invertase, and all five hexose trans- 
porter genes that were induced during the 
course of the diauxic shift were similarly 
induced, in duplicate experiments, by the 
deletion of TUPl. 

The set of genes affected by Tupl in this 
experiment also included a-glucosidases, 
the mating- type-specific genes MFAl and 
MFA2, and the DNA damage-inducible 
RNR2 and RNR4, as well as genes involved 
in flocculation and many genes of unknown 
function. The hybridization signal corre- 
sponding to expression of TUPl itself was 
also severely reduced because of the (in- 
complete) deletion of the transcription unit 
in the tup] A strain, providing a positive 
control in the experiment (42). 

Many of the transcriptional targets of 
Tupl fell into sets of genes with related 
biochemical functions. For instance, al- 
though only about 3% of all yeast genes 
appeared to be TUPl -repressed by a factor 
of more than 2 in duplicate experiments 
under these conditions, 6 of the 13 genes 
that have been implicated in flocculation 
(15) showed a reproducible increase in 
expression of at least twofold when TUPl 
was deleted. Another group of related 
genes that appeared to be subject to TUPl 
repression encodes the serine-rich cell 
wall mannoproteins, such as Tipl and 
Tirl/Srpl which are induced by cold 
shock and other stresses (43), and similar, 
serine-poor proteins, the seripauperins 
(44). Messenger RNA levels for 23 of the 
26 genes in this group were reproducibly 
elevated by at least 2.5-fold in the tupiA 



www.sciencemag.org • SCIENCE • VOL. 278 • 24 CXTOBER 1997 



683 



strain, and 18 of these genes were induced 
by more than sevenfold when TUP J was 
deleted. In contrast, none of 83 genes that 
could be classified as putative regulators of 
the cell division cycle were induced more 
than twofold by deletion of TUP! . Thus, 
despite the diversity of the regulatory sys- 
tems that employ Tupl, most of the genes 
that it regulates under these conditions 
fall into a limited number of distinct func- 
tional classes. 

Because the microarray allows us to 
monitor expression of nearly every gene in 
yeast, we can, in principle, use this ap- 
proach to identify all the transcriptional 
targets of a regulatory protein like Tupl. It 
is important to note, however, that in any 
single experiment of this kind we can only 
recognize those target genes that are nor- 
mally repressed (or induced) under the 
conditions of the experiment. For in- 
stance, the experiment described here an- 
alyzed a MAT a strain in which MFAl 
and MFA2, the genes encoding the a- 
factor mating pheromone precursor, are 
normally repressed. In the isogenic tup] A 
strain, these genes were inappropriately 
expressed, reflecting the role that Tupl 
plays in their repression. Had we instead 
carried out this experiment with a MATA 
strain (in which expression of MFAl and 
MFA2 is not repressed), it would not have 
been possible to conclude anything re- 
garding the role of Tupl in the repression 
of these genes. Conversely, we cannot dis- 
tinguish indirect effects of the chronic 
absence of Tupl in the mutant strain from 
effects directly attributable to its partici- 
pation in repressing the transcription of a 
gene. 

Another simple route to modulating the 
activity of a regulatory factor is to overex- 
press the gene that encodes it. YAPi en- 
codes a DNA-binding transcription factor 
belonging to the b-zip class of DNA-bind- 
ing proteins. Overexpression of YAPI in 
yeast confers increased resistance to hydro- 
gen peroxide, o-phenanthroline, heavy 
metals, and osmotic stress (45). We ana- 
lyzed differential gene expression between a 
wild- type strain bearing a control plasmid 
and a strain with a plasmid expressing YAP! 
under the control of the strong GALl-lO 
promoter, both grown in galactose (that is, 
a condition that induces YAP J overexpres- 
sion). Complementary DNA from the con- 
trol and YAPi overexpressing strains, la- 
beled with Cy3 and Cy5, respectively, was 
prepared from mRNA isolated from the two 
strains and hybridized to the microarray. 
Thus, red spots on the array represent genes 
that were induced in the strain overexpress- 
ing YAP]. 

Of the 17 genes whose mRNA levels 
increased by more than threefold when 



YAP] was overexpressed in this way, five 
bear homology to aryl-alcohol oxidoreduc- 
tases (Fig. 2 and Table 1). An additional 
four of the genes in this set also belong to 
the general class of dehydrogenases/oxi- 
doreductases. Very little is known about 
the role of aryl-alcohol oxidoreductases in 
S. cerevisiae, but these enzymes have been 
isolated from ligninolytic fungi, in which 
they participate in coupled redox reac- 
tions, oxidizing aromatic, and aliphatic 
unsaturated alcohols to aldehydes with the 
production of hydrogen peroxide (46, 47). 
The fact that a remarkable fraction of the 
targets identified in this experiment be- 
long to the same small, functional group of 
oxidoreductases suggests that these genes 

Fig. 4. Coordinated reg- 
ulation of functionally re- 
lated, genes. The curves 
represent the average in- 
duction or repression ra- 
tios for ail the genes in 
each indicated group. 
The total numt^er of 
genes in each group was 
as foIto>ws: hbosomal 
proteins, 112; translation 
elongation and initiation 

factors, 25; tRNA synthetases {exduding mitochondial synthetases). 17; glycogen and trehalose syn- 
thesis and degradation. 15; cytochrome c oxidase and reductase proteins. 19; and TCA- and glyoxy- 
late-cycle enz/mes, 24. 

, Table 1 . Genes induced by YAPI werexpression. This list includes all the genes for which mRNA levels 
increased by more than twofold upon YAPI overexpression in both of two dupttcate experiments, and 
for which the average increase in mRNA level in the two experiments was greater than threefold {5Cf}. 
Positions of the canonical Yapl binding sites upstream of the start codon. when present, and the 
average fold-increase in mRNA levels measured in the two experiments.are indicated. 



might play an important protective role 
during oxidative stress. Transcription of a 
small number of genes was reduced in the 
strain overexpressing Yapl. Interestingly, 
many of these genes encode sugar per- 
meases or enzymes involved in inositol 
metabolism. 

We searched, for Yapl-binding sites 
(TTACTAA or TGACTAA) in the se- 
quences upstream of the target genes we 
identified (48). About two- thirds of the 
genes that were induced by more than 
threefold upon Yapl overexpression had 
one or more binding sites within 600 bases 
upstream of the start codon (Table 1), sug- 
gesting that they are directly regulated by 
Yapl. The absence of canonical Yapl-bind- 
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YNL331C 






Putative aryl-alcohol reductase 


12.9 


YKL071W 


162-222 (5 sites) 




Similarity to bacterial csgA protein 


10.4 


YML007W 




YAPI 


Transcriptional activator involved in 
oxidative stress response 


9.8 


YFL056C 


223, 242 




Homology to aryl-alcohol 
dehydrogenases 


; 9.0 


YLL060C 


98 




Putative glutathione transferase 


"7.4 


YOL165C 


266 




Ptjtatlve aryl-alcohol dehydrogenase 
(NADP+) 


7.0 


YCR107W 






Putative aryl-alcohol reductase 


6.5 


YML116W 


409 


ATR1 


Aminotriazole and 4-nitroc|uinoline 
resistance protein 


6.5 


YBR008C 


142, 167, 364 




Homology to benomyl/methotrexate 
resistance protein 


6.1 


YCLX08C 






Hypothetical protein 


6.1 
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Putative aryl-alcohol dehydrogenase 


6.0 
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148, 212 


0YE3 


NAPDH dehydrogenase {old yellow 
enzyme), isoform 3 


5.8 
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167. 317 




Homology to hypothetical proteins 
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4.7 
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Homology to hypothetical protein 
YMR251W 


4.5 
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OYE2 


NAD(P)H oxidoreductase (old yellow 
enzyme), isoform 1 


4.1 
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Similarity to A. thaliana zeta-crystaliin 


3.7 








homolog 


3.3 


YOL126C 




MDH2 


Malate dehydrogenase 
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ing sites upstream of the others may reflect 
an ability of Yapl to bind sites that differ 
from the canonical binding sites, perhaps in 
cooperation with other factors, or less Uke- 
ly, may represent an. indirect effect of Yapl 
overexpression, mediated by one or more 
intermediary factors. Yapl sites were found 
only four times in the corresponding region 
of an arbitrary set of 30 genes that were not 
differentially regulated by Yapl. 

Use of a DNA microarray to character- 
ize the transcriptional consequences of 
mutations affecting the activity of regula- 
tory molecules provides a simple and pow- 
erful approach to dissection and character- 
ization of regulatory pathways and net- 



works. This strategy also has an important 
practical application in drug screening. 
Mutations in specific genes encoding can- 
didate drug targets can serve as surrogates 
for the ideal chemical inhibitor or modu- 
lator of their activity. DNA microarrays 
can be used to define the resulting signa- 
ture pattern of alterations in gene expres- 
sion, and then subsequently used in an 
assay to screen for compounds that repro- 
duce the desired signature pattern. 

DNA microarrays provide a simple and 
economical way to explore gene expres- 
sion patterns on a genomic scale. The 
hurdles to extending this approach to any 
other organism are minor. The equipment 



required for fabricating and using DNA 
microarrays (9) consists of components 
that were chosen for their modest cost and 
simplicity. It was feasible for a small group 
to accomplish the amplification of more 
than 6000 genes in about 4 months and, 
once the amplified gene sequences were in 
hand, only 2 days were required to print a 
set of 110 microarrays of 6400 elements 
each. Probe preparation, hybridization, 
and fluorescent imaging are also simple 
procedures. Even conceptually simple ex- 
periments, as we described here, can yield 
vast amounts of information. The value of 
the information from each experiment of 
this kind will progressively increase as 
more is learned about the functions of 
each gene and as additional experiments 
define the global changes in gene expres- 
sion in diverse other natural processes and 
genetic perturbations. Perhaps the greatest 
challenge now is to develop efficient 
methods for organizing, distributing, inter- 
preting, and extracting insights from the 
large volumes of data these experiments 
will provide. 
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Fig. 5. Distinct temporal patterns of induction or repression hdp to group genes that share regulatory 
properties. (A) Temporal profile of the cell density, as measured by OD at 600 nm and glucose 
concentration in the media. (B) Seven genes exhibited a strong induction (greater than ninefold) only at 
the last timepoint (20.5 hours). With the exception of IDP2, each of these genes has a CSRE UAS. There 
were no additional genes observed to match this profile. (C) Seven members of a class of genes marked 
by early induction with a peak in mRNA levels at 18.5 hours. Each of these genes contain STRE motif 
repeats in their upstream promoter regions. (D) Cytochrome c oxidase and ubiquinol cytochrome c 
reductase genes. Mari<ed by an induction coincident with the diauxic shift, each of these genes contains 
a consensus binding motif for the HAP2,3,4 protein complex. At least 17 genes shared a similar 
expression profile. (E) SAM 7, GPP 7, and several genes of unknown function are repressed before the 
diauxic shift, and continue to be repressed upon entry into stationary phase. (F) Ribosomal protein 
genes comprise a large class of genes that are repressed upon depletion of glucose. Each of the genes 
profiled here contains one or more RAPI -binding motifs upstream of its promoter. RAP1 is a transcrip- 
tional regulator of most ribosomal proteins. 
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than twofold (range 1.6- to 1.9-fold) 
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reviewers for many h^ful comments on the manu- 
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Examiner: O'Hara, E 



Filing Date: June 28, 2001 

Group Art Unit: 1646 



Mail Stop Appeal Brief-Patents 
Commissioner for Patents 
P.O. Box 1450 
Alexandria, VA 22313-1450 



Sir: 



BRIEF ON APPEAL 



Further to the Notice of Appeal ffled November 5, 2003, and received by the USPTO on 
November 7, 2003, herewith are three copies of Appellants' Brief on Appeal. Authorized fees 
include the $ 330.00 fee for the filing of this Brief 

This is an appeal from the decision of the Examiner finally rejecting claims 1-6 of the 
above-identified application. 



(1) REAL PARTY IN INTEREST 
The above-identified application is assigned of record to Incyte Pharmaceuticals, Inc., 
(now Incyte Corporation, formerly known as Incyte Genomics, hic.) (Reel 012272, Frame 0191) 
which is the real party in interest herein. 
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(2) RELATED APPEALS AND INTERFERKNrF.5; 
Appellants, their legal representative and the assignee are not aware of any related 
appeals or interferences which will directly affect or be directly affected by or have a bearing on 
the Board's decision in the instant appeal. 

(3) STATUS OF THE CLAIMS 
Claims rejected: Claims 1-6. 
Claims allowed: (none). 
Claims canceled: Claims 13-20. 
Claims withdrawn: Claims 7-12. 

Claims on Appeal: Claims 1-6 (A copy of the claims on appeal, as amended, can be 

found in the attached Appendix.) 

(4) STATUS OF AMENDMENTS AFTER FTNAT . 
There were no amendments submitted after Final Rejection. 

(5) SUMMARY OF THF. INVENTION 
AppeUants' invention is directed to polynucleotides encoding a human G-protein coupled 
receptor (GPCR), SEQ ID NO: 1, in particular, a metabotropic glutamate GPCR, based on the 
conservation of various sequence motifs characteristic of this family of proteins, in particular, the • 
seven hydrophobic transmembrane domains characteristic of GPCRs. See specification, at page 
1 1 and Table 1 and Figure 1. The glutamate GPCRs are described in the specification and the art 
of record as important in neurotransmission and involved in neurological disorders such as 
epilepsy, stroke, and neurodegeneration. See specification, at page 2. Polynucleotides encoding 
SEQ ID N0:1 are also disclosed as differentially expressed in thyroid tumors, in particular, 
follicular carcinoma based on Northern analysis in thyroid tissues. See specification, at page 35. 
The claimed polynucleotides are asserted to be usefiil in the diagnosis, treatment, and evaluation 
of therapies for neurological and neoplastic disorders, in particular, foUicular carcinoma. 
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(6) ISSUES 

1. Whether claims 1-6 directed to SEQ ID N0:1 encoding polynucleotides meet the 
utility requirement of 35 U.S.C. §101. In particular, whether the conservation of sequence motifs 
and domains between the protein coded for by the claimed polynucleotide and metabotropic 
GPCRs, known to have utility in neurotransmission and neurological disorders, demonstrates a 
"substantial likelihood" of utility under 35 U.S.C. § 101. Whether there is evidence that the 
differential expression of the polynucleotide encoding SEQ ID NO: 1 in thyroid tumors provides 

a substantial likelihood of utility for the claimed polynucleotides in the detection and diagnosis 
of thyroid tiunors. 

2. Whether one of ordinary skill in the art would know how to use the claimed 
polynucleotides, e.g., in toxicology testing, drug development, and the diagnosis of disease, so as 
to satisfy the enablement requirement of 35 U.S.C. § 1 12, first paragraph. 

3. Whether fragments and variants of the polynucleotides encoding SEQ ID NO: 1 
are sufficiently described in the specification that the skilled artisan would recognize applicant's 
possession of them at the time the application was filed in accordance with 35 U.S.C. § 1 12, First 
Paragraph. 

4. Whether the claimed polynucleotides are sufficiently described in priority 
application Serial No. 09/516,513, filed September 17, 1998 to meet the requirements of 35 
U.S.C. § 1 12, first paragraph and claim an effective priority date of September 17, 1998 with 
respect to the now claimed invention. 

(7) GROUPING OF THE CLAIMS 

As to Issue 1 

All of the claims on appeal stand or faU together. 
As to Issue 2 

All of the claims on appeal stand or fall together 
As to Issue 3 
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All of the claims on appeal stand or fall together 
As to Issue 4 

Claims 1 and 3-6 stand or fall together. 

(8) APPELLANTS' ARGUMENTS 

The rejection of claims 1-6 under 35 U.S.C. §§ 101 and 112, first paragraph is improper, as 
the inventions of those claims have a patentable utility as set forth in the instant specification, 
and/or a utility well known to one of ordinary skill in the art. 

Claims 1-6 stand rejected under 35 U.S.C. §§ 101 and 1 12, first paragraph, based on the 
allegation that the claimed invention lacks patentable utility. The rejection alleges in particular 
that: 

the claimed invention is not supported by either a substantial and specific asserted utility 
or a well- estabUshed utility. None of the described uses are considered to be specific or 
substantial utilities for either the protein or encodmg nucleic acid molecules. Methods 
such as identification of Ugands, use to screen for homologous genes, use to identify 
chromosomes or chromosomal locations, use to recombinantly produce protein or to 
generate antibodies are considered general methods appHcable to any protein and/or 
nucleic acid. 

• Apphcants assertion that the claimed polynucleotide can be used in cancer diagnosis, in 
particular follicular carcinoma of the thyroid, is unconvincing because the correlation 
betv^een the expression of the polynucleotide and follicular carcinoma is based on one 
single library. The determination of a cancer marker must be based on studying resuhs 
from considerable number of patients, and statistical analysis. See Guidelines for Marker 
Development by the National Cancer histitute (NCI), 

The invention at issue is a polynucleotide corresponding to a gene that is expressed in 
humans. The novel polynucleotide codes for a polypeptide demonstrated in the patent 
specification to be a member of the class of glutmate GPCRs, v^hose biological functions include 
control of neuerotransmission. The claimed invention has numerous practical, beneficial uses in 
toxicology testing, drug development, and the diagnosis of disease, none of which requires 
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knowledge of how the polypeptide coded for by the polynucleotide actually functions. The 
claimed invention can be used, for example, as a marker for cancers of the thyroid, in particular, 
follicular carcinoma. See specification, at page 35. As a result of the benefits of these uses, the 
claimed invention already enjoys significant commercial success. 

Applicants have previously submitted a Declaration by Dr. John C. Rockett showing the 
many reasons why the use of the claimed polynucleotides in gene expression profiling studies in 
toxicology testing would be readily apparent to the skilled artisan at the time the application was 
filed. 

Applicants further submit two additional expert Declarations by Dr. Vishwanath R. Iyer 
and Dr. Tod BediHon under 37 C.F.R. § 1. 132, with respective attachments, and ten (10) 
scientific references filed before the September 17, 1998 priority date of the instant application. 
TTie Rockett Declaration, Iyer Declaration, BediUon Declaration, and the ten (10) references fully 
establish that, prior to the September 17, 1998 filing date of the parent Bandman '513 
application, it was well-established in the art that: 

polynucleotides derived from nucleic acids expressed in one or 
more tissues and/or cell types can be used as hybridization probes - that is, as 
tools -- to survey for and to measure the presence, the absence, and the amount 
of expression of their cognate gene; 

with sufficient length, at sufficient hybridization stringency, and 
with sufficient wash stringency -- conditions that can be routinely established -- 
expressed polynucleotides, used as probes, generate a signal that is specific to 
the cognate gene, that is, produce a gene-specific expression signal; 

expression analysis is useful, inter alia, in drug discovery and 
lead optunization efforts, in toxicology, particularly toxicology studies 
conducted early in drug development efforts, and in phenotypic 
characterization and categorization of cell types, including neoplastic cell 
types; 

each additional gene-specific probe used as a tool in expression 
analysis provides an additional gene-specific signal that could not otherwise 
have been detected, giving a more comprehensive, robust, higher resolution, 
statistically more significant, and thus more usefiil expression pattern in such 
analyses than would otherwise have been possible; 

biologists, such as toxicologists, recognize the mcreased utiUty 
of more comprehensive, robust, higher resolution, statistically more significant 
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results, and thus want each newly identified expressed gene to be included in 
such an analysis; 

nucleic acid microarrays increase the parallelism of expression 
measurements, providing expression data analogous to that provided by older, 
lower throughput techniques, but at substantially increased throughput; 

accordingly, when expression profiling is performed using 
microarrays, each additional gene-specific probe that is included as a signaling 
component on this analytical device increases the detection range, and thus 
versatiUty, of this research tool; 

biologists, such as toxicologists, recognize the increased utility 
of such improved tools, and thus want a gene-specific probe to each newly 
identified expressed gene to be included in such an analytical device; 

the industrial suppliers of microarrays recognize the increased 
utility of such improved tools to their customers, and thus strive to improve 
salabihty of their microairays by adding each newly identified expressed gene 
to the microarrays they sell; 

it is not necessary that the biological function of a gene be 
known for measurement of its expression to be useful in drug discovery and 
lead optimization analyses, toxicology, or molecular phenotyping experiments; 

failure of a probe to detect changes in expression of its cognate 
gene does not diminish the usefulness of the probe as a research tool; and 

failure of a probe completely to detect its cognate transcript in 
any single expression analysis experiment does not deprive the probe of 
usefulness to the community of users who would use it as a research tool. ] 

The Patent Examiner does not dispute that the claimed polynucleotide can be used as a 
probe in cDNA microarrays and used in gene expression monitoring applications. Instead, the 
Patent Examiner contends that the claimed polynucleotide cannot be useful without precise 
knowledge of its biological function, or the biological function of the polypeptide it encodes. 
But the law has never required knowledge of biological function to prove utility. It is the 
claimed invention's uses, not its functions, that are the subject of a proper analysis under the 
utility requirement. 

In any event, as demonstrated by the Rockett Declaration, the Iyer Declaration, and the 
Bedilion Declaration, the person of ordinary skill in the art can achieve beneficial results from 
the claimed polynucleotide in the absence of any knowledge as to the precise function of die 
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protein encoded by it. The uses of the claimed polynucleotide in gene expression monitoring 
applications are in fact independent of its precise biological function. 

I. The applicable legal standard 

To meet the utility requirement of sections 101 and 1 12 of the Patent Act, the patent 

applicant need only show that the claimed invention is "practically useful," Anderson v. Natta, 

480 F.2d 1392, 1397, 178 USPQ 458 (CCPA 1973) and confers a "specific benefit" on the 

public. Brenner v. Manson, 383 U.S. 519, 534-35, 148 USPQ 689 (1966). As discussed in a 

recent Court of Appeals for the Federal Circuit case, this threshold is not high: 

An invention is "useful" under section 101 if it is capable of providing some identifiable 
benefit. See Brenner v. Manson, 383 U.S. 519, 534 [148 USPQ 689] (1966); Brooktree 
Corp. V. Advanced Micro Devices, Inc., 911 F.2d 1555, 1571 [24 USPQ2d 1401] (Fed. 
Cir. 1992) ("to violate Section 101 the claimed device must be totally incapable of 
achieving a useful result"); Fuller v. Berger, 120 F. 274, 275 (7th Cir. 1903) (test for 
utiUty is whether invention "is incapable of serving any beneficial end"). 

Juicy Whip Inc. v. Orange Bang Inc. , 51 USPQ2d 1700 (Fed. Cir, 1999). 

While an asserted utility must be described with specificity, the patent appUcant need not 
demonstrate utihty to a certamty. In Stiftung v. Renishaw PLC, 945 F.2d 1 173, 1 180, 
20 USPQ2d 1094 (Fed. Cir. 1991), the United States Court of Appeals for the Federal Circuit 
explained: 

An invention need not be the best or only way to accomplish a certain result, and it need 
only be useful to some extent and in certain applications: "[TJhe fact that an invention has 
only limited utility and is only operable in certain applications is not grounds for finding 
lack of utility." Envirotech Corp. v. Al George, Inc., 730 F.2d 753, 762, 221 USPQ 473 
480 (Fed. Cir. 1984). 

The specificity requirement is not, therefore, an onerous one. If the asserted utility is 
described so that a person of ordinary skill in the art would understand how to use the claimed 
invention, it is sufficiently specific. See Standard Oil Co. v. Montedison, S.p.a. , 212 U.S.P.Q. 
327, 343 (3d Cir. 1981). The specificity requirement is met unless the asserted utility amounts to 
a "nebulous expression" such as "biological activity" or "biological -properties" that does not 
convey meaningful information about the utility of what is being claimed. Cross v. lizuka, 
753 F.2d 1040, 1048 (Fed. Cir. 1985). 
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In addition to conferring a specific benefit on the public, the benefit must also be 
"substantial." Brenner, 383 U.S. at 534. A "substantial" utility is a practical, "real-world" utility. 
Nelson v. Bowler, 626 F.2d 853, 856, 206 USPQ 881 (GCPA 1980). 

If persons of ordinary skill in the art would understand that there is a "well-established" 
utility for the claimed invention, the threshold is met automatically and the applicant need not 
make any showing to demonstrate utility. Manual of Patent Examining Procedure at § 706.03(a). 
Only if there is no "well-estabhshed" utility for the claimed invention must the applicant 
demonstrate the practical benefits of the invention. Id. 

Once the patent applicant identifies a specific utility, the claimed invention is presumed 
to possess it. In re Cortright, 165 F.3d 1353, 1357, 49 USPQ2d 1464 (Fed. Cir. 1999); In re 
Brana, 51 F.3d 1560, 1566; 34 USPQ2d 1436 (Fed. Cir. 1995). In that case, the Patent Office 
bears the burden of demonstrating that a person of ordinary skill in the art would reasonably 
doubt that the asserted utility could be achieved by the claimed invention. Id. To do so, the 
Patent Office must provide evidence or sound scientific reasoning. See In re Langer, 503 F.2d 
1380, 1391-92, 183 USPQ 288 (CCPA 1974). If and only if the Patent Office makes such a 
showing, the burden shifts to the applicant to provide rebuttal evidence that would convince the 
person of ordinary skill that there is sufficient proof of utiUty. Brana, 5 1 F.3d at 1566. The 
apphcant need only prove a "substantial likelihood" of utility; certainty is not required. Brenner, 
383 U.S. at 532. 

II. Toxicology Testing and disease diagnosis are sufficient utilities under 35 U.S.C. 
§§ 101 and 112, first paragraph 

The claimed invention meets all of the necessary requirements for estabUshing a credible 
utility under the Patent Law: There are "well-established" uses for the claimed invention known 
to persons of ordinary skill in the art, and there are specific practical and beneficial uses for the 
invention disclosed in the patent application's specification. These uses are explained, in detail, 
in the Rockett Declaration, Iyer Declaration, and Second Bedilion Declaration accompanying 
this briefer previously submitted. Objective evidence, not considered by the Patent Office, 
further corroborates the credibility of the asserted utilities. 
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A. The use of the clahned SEQ ID N0:1 encoding polynucleotides for toxicology 
testing, drug discovery, and disease diagnosis are practical uses that confer 
' 'specific benefits " to the public 

The claimed inyention has specific, substantial, real-world utility by virtue of its use in 
toxicology testing, drug development and disease diagnosis through gene expression profiling. 
These uses are explained in detail in the accompanying Rockett Declaration, Iyer Declaration, 
and Bedilion Declaration, the substance of which is not rebutted by the Patent Examiner. There 
is no dispute that the claimed invention is in fact a useful tool m cDNA microarrays used to 
perform gene expression analysis. That is sufficient to estabUsh utility for the claimed 
polynucleotide. 

hi his Declaration, Dr. Rockett explains the many reasons why a person skilled in the art 
in 1998 would have understood that any expressed polynucleotide is useful for a number of gene 
expression monitoring appUcations, e,g,, in cDNA microarrays, in connection with the 
development of dmgs and the monitoring of the activity of such dmgs. (Rockett Declaration at, 
e.g., H 10-18): 

It is my opmion, therefore, based on the state of the art in toxicology at least since 
the mid-1990s . . . that disclosure of the sequence of a new gene or protein, with or 
without knowledge of its biological function, would have been sufficient mformation 
for a toxicologist to use the gene and/or protein in expression profiling studies m 
toxicology.* [Rockett Declaration,^ 18.] 

hi his Declaration, Dr. Bedilion explains why a person of skill in the art m 1998 would 
have understood that any expressed polynucleotide is useful for gene expression monitoring 
applications using cDNA microarrays. (Bedihon Declaration, e.g., ff 4-7.) In his Declaration, 
Dr. Iyer explains why a person of skill in the art m 1998 would have understood that any 
expressed polynucleotide is useful for gene expression monitoring appUcations usmg cDNA 
microarrays, stating that "[t]o provide maximum versatility as a research tool, the microarray 
should include D and as a biologist I would want my microarray to include D each newly 
identified gene as a probe." (Iyer Declaration, ^ 9.) 

"Use of the words 'it is my opmion' to preface what someone of ordinary skill in the art 
would have known does not transform the factual statements contained in the declaration into 
opmion testunony." In re Alton, 37 USPQ2d 1578, 1583 (Fed. Cir. 1996). 
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In addition, Dr. Rockett explains in his Declaration that "there are a number of other 
differential expression analysis technologies that precede the development of microarrays, some 
by decades, and that have been applied to drug metabolism and toxicology research, including: 
(1) differential screening; (2) subtractive hybridization, including variants such as chemical 
cross-linking subtraction, suppression-PCR subtractive hybridization and representational 
difference analysis; (3) differential display; (4) restriction endonuclease facilitated analyses, 
including serial analysis of gene expression (SAGE) and gene expression fingerprinting and (5) 
EST analysis." (Rockett Declaration, ^ 7.) 

Nowhere does the Patent Examiner address the fact that, as described on pages 31-32 of 
the Bandman '513 application, the claimed polynucleotides can be used as highly specific probes 
in, for example, cDNA microarrays - probes that wifliout question can be used to measure both 
the existence and amount of complementary RNA sequences known to be the expression 
products of the claimed polynucleotides. The claimed invention is not, in that regard, some 
random sequence whose value as a probe is speculative or would require further research to 
determine. 

Given the fact that the claimed polynucleotide is known to be expressed, its utility as a 
measuring and analyzing instrument for expression levels is as ijidisputable as a scale's utility for 
measuring weight. This use as a measuring tool; regardless of how the expression level data 
ultimately would be used by a person of ordinary skill in the art, by itself demonstrates that the 
claimed invention provides an identifiable, real-worid benefit that meets the utiUty requirement. 
Raytheon v. Roper, 724 F.2d 951, (Fed. Cir. 1983) (claimed invention need only meet one of its 
stated objectives to be useful); In re Cortwright, 165 F.3d 1353, 1359 (Fed. Cir. 1999) (how the 
invention works is irrelevant to utility); MPEP § 2 107 ("Many research tools such as gas 
chromatographs, screening assays, and nucleotide sequencing techniques have a clear, specific, 
and unquestionable utility (e.g., they are useful in analyzing compounds V (emphasis added)). 

Literamre reviews published shortly before the filing of the Bandman '5 1 3 application 
describing the state of the art further confrnn the claimed invention's utUity. Rockett et al. 
confrnn, for example, that the claimed invention is useful for differential expression analysis 
regardless of how expression is regulated: 
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Despite the development of miiltiple technological advances which have recently 
brought the field of gene expression profiling to the forefront of molecular 
analysis, recognition of the importance of differential gene expression and 
characterization of differentially expressed genes has existed for many years. 

Although differential expression technologies are applicable to a broad range of 
models, perhaps their most important advantage is that, m most cases, absolutely 
no prior knowledge of the specific genes which are up- or down-regulated is 
required. 

* * * 

Whereas it would be informative to know the identity and functionality of all 
genes up/down regulated by . . . toxicants, this would appear a longer term goal 

However, the current use of gene profihng yields a pattern of gene changes 

for a xenobiotic of unknown toxicity which may be matched to that of well 
characterized toxins, thus alerting the toxicologist to possible in vivo similarities 
between the unknown and the standard, thereby providing a platform for more 
extensive toxicological examination, (emphasis m original) 

Rockett et al.. Differentia l eene expression in drug metabolism and toxicologv: practicalities. 

problems and potential. Xenobiotica 29:655-69 1 (July 1999) (Rockett Declaration, Exhibit C). 

hi another pre-September 1998 article, Lashkari et al. state explicitly that sequences that 

are merely "predicted" to be expressed (predicted Open Reading Frames, or ORFs)- the claimed 

invention in fact is known to be expressed- have numerous uses: 

Efforts have been directed toward the amplification of each predicted ORE or any 
other region of the genome ranging from a few base pairs to several kilobase 
pairs. There are many uses for these amplicons- they can be cloned into standard 
vectors or specialized expression vectors, or can be cloned into other specialized 
vectors such as those used for two-hybrid analysis. The amplicons can also be 
used direc dv bv, for example, arraying onto glass for expression analvsis . for 
DNA binding assays, or for any direct DNA assay, (emphasis added) 

Lashkari et al., Whole ge nome analvsis: Experimental access to all genome sequenced segments 
through larger-scale ef ficient oUgonucleotide synthesis and PGR . Proc. Nat. Acad. Sci. 94:8945- 
8947 (Aug. 1997) (Reference No. 1). 



117940 



11 



09/895,686 



Docket No.: PC-0044 CIP 

B. The use of polynucleotides coding for polypeptides expressed by humans as 
tools for toxicology testing, drug discovery, and the diagnosis of disease is 
now "well-established" 

The technologies made possible by expression profUing and the DNA tools upon which 
they rely are now well-established. The technical literature recognizes not only the prevalence of 
these technologies, but also their unprecedented advantages in drug development, testing and 
safety assessment. These technologies include toxicology testing, e.g., as described by Bedilion, 
Rockett, and Iyer in their Declarations. 

Toxicology testing is now standard practice in the pharmaceutical industry. See, e.g., 
John C. Rockett et al., supra: • 

Knowledge of toxin-dependent regulation in target tissues is not solely an academic 
pursuit as much interest has been generated in the pharmaceutical industry to harness this 
technology in the early identification of toxic drag candidates, thereby shortening the 
developmental process and contributing substantially to the safety assessment of new 
drags. (Rockett Declaration, Exhibit C, page 656) 

To the same effect are several other scientific publications, including Emile F. Nuwaysir et al., 
Microarrays and toxicoloev: The advent of toxicogenomics Molecular Carcinogenesis 24: 153- 
159 (1999) (Reference No. 2); Sandra Steiner and N. Leigh Anderson, Expression orofilinp in 
toxicology - potentials and limitations . Toxicology Letters 112-13:467-471 (2000) (Reference 
No. 3). 

Nucleic acids useful for measuring the expression of whole classes of genes are routinely 
incorporated for use in toxicology testing. Nuwaysir et al. describes, for example, a Human 
ToxChip comprising 2089 human clones, which were selected 

for their well-documented involvement in basic cellular processes as well as their 
responses to different types of toxic insult! Included on this list are DNA replication and 
repair genes, apoptosis genes, and genes responsive to PAHs and dioxin-like compounds, 
peroxisome proliferators, estrogenic compounds, and oxidant stress. Some of the other 
categories of genes include transcription factors, oncogenes, tumor suppressor genes, 
cyclins, kinases, phosphatases, cell adhesion and motility genes, and homeobox genes. 
Also included in this group are 84 housekeepmg genes, whose hybridization mtensity is 
averaged and used for signal normalization of the other genes on the chi'p. 

See also Table 1 of Nuwaysir et al. (listing additional classes of genes deemed to be of special 
interest in making a human toxicology microarray). 
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The more genes that are available for use in toxicology testing, the more powerful the 
technique. "Arrays are at their most powerful when they contain the entire genome of the species 
they are being used to study." John C. Rockett and David J. Dix, Application of DNA arravs to 
toxicologv . Environ. Health Perspec. 107:681-685 (1999) (Reference No. 4). Control genes are 
carefully selected for their stability across a large set of array experiments in order to best study 
the effect of toxicological compounds. See attached email from the primary investigator on the 
Nuwaysir paper, Dr. Cynthia Afshari, to an Incyte employee, dated July 3, 2000, as well as the 
origiaal message to which she was responding (Reference No. 5), indicating that even the 
expression of carefully selected control genes can be altered. Thus, there is no expressed gene 
which is irrelevant to screening for toxicological effects, and all expressed genes have a utility 
for toxicological screening. 

Further evidence of the well-established utility of all expressed polypeptides and 
polynucleotides in toxicology testing is found in U.S. Pat. No. 5,569,588 (Reference No. 9e) 
and pubUshed PCT applications WO 95/21944 (Reference No. 9a), WO 95/20681 
(Reference No. 9b), and WO 97/13877 (Reference No. 9g). 

WO 95/21944 ("Differentially expressed genes in healthy and diseased subjects"), 
published August 17, 1995, describes the use of microarrays in expression profiling analyses, 
emphasizing that patterns of expression can be used to distinguish healthy tissues from diseased 
tissues and that patterns of expression can additionally be used in drug development and 
toxicology studies, without knowledge of the biological function of the encoded gene product. 
In particular, and with emphasis added: 

The present invention iovolves . . . methods for diagnosing diseases . . . 
characterized by the presence of [differentially expressed] . . . genes, despite the 
absence of knowledge about the gene or its function . The methods involve the use 
of a composition suitable for use in hybridization which consists of a solid surface 
on which is immobilized at pre-defiaed regions thereon a plurality of defined 
oligonucleotide/ polynucleotide sequences for hybridization. Each sequence 
comprises a fragment of an EST . . . . Differences in hybridization patterns produced 
through use of this composition and the specified methods enable diagnosis of 
diseases based on differential expression of genes of unknown function . . . . 
[abstract] 

The method [of the present invention] involves producing and comparing 
hybridization patterns formed between samples of expressed mRNA or cDNA 
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polynucleotide sequences ... and a defined set of oligonucleotide/polynucleotide Q 
. . . immobilized on a support. Those defined [immobilized] 
oligonucleotide/polynucleotide sequences are representative of the total expressed 
genetic component of the cells, tissues, organs or organism as defined by the 
collection of partial cDNA sequences (ESTs). [page 2] 

The present invention meets the unfilled needs m the art by providing 
methods for the . . . use of gene fragments and genes, even those of unknown full 
length sequence and u nknown function, which are differentially expressed in a 
healthy animal and in an animal having a specific disease or iofection by use of 
ESTs derived from DNA libraries of healthy and/or diseased/infected animals, 
[page 4] 

Yet another aspect of the invention is that it provides ... a means for . . . 
monitoring the efficacy of disease treatment regimes including . . . toxicological 
effects thereof ." [page 4] 

It has been appreciated that one or more differentially identified EST or 
gene-specific oligonucleotide/polynucleotides define a pattern of differentially 
expressed genes diagnostic of a predisease, disease or infective state. A knowledge 
of the specific biological function of the EST is not required only that the ESTQ 
identifies a gene or genes whose altered expression is associated reproducibly with 
the predisease, disease or infectious state, [page 4] 

As used herein, the term 'disease' or 'disease state' refers to any condition 
which deviates from a normal or standardized healthy state in an organism of the 
same species in terms of differential expression of the organism's genes, 
[whether] of genetic or environmental origin, for example, an inherited disorder 

such as certain breast cancers [or] administration of a drug or exposure of the 

animal to another agent, e.g., nutrition, which affects gene expression, [page 5] 

As used herein, the term 'soUd support' refers to any known substrate which 
is useful for the immobiUzation of large numbers of oligonucleotide/polynucleotide 
sequences by any available method . . . [and includes, inter alia,] nitrocellulose, 
glass, silica. . . . [page 6] 

By 'EST' or 'Expressed Sequence Tag' is meant a partial DNA or cDNA 
sequence of about 150 to 500, more preferably about 300, sequential nucleotides. 
• [page 6] 

One or more libraries made from a single tissue type typically provide at 
least about 3000 different (i.e., unique) ESTs and potentially the full complement nf 
all possible ESTs representing all cDN As e.g.. 50.000 to 100.000 in an animal such 
as a human , [page 7] 
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The lengths of the defined oUgonucleotide/ polynucleotides may be readily 

increased or decreased as desired or needed The lengdi is generally guided bv 

the principle that it shoul d be of sufficient length to insure that it is onFl average 
only repres ented once in the population to be examined , [page 7] 

Comparing the . . . hy bridization patterns permits detection of those defined 
oUgonucleotide/ polynucleotides which are differentially expressed between the 
healthy control and the disease sample by the presence of differences in the 
hybridization patterns at pre-defined regions [of the solid support], [page 13] 

It should be appreciated that one does not have to be restricted in using 
ESTs from a particular tissue from which probe RNA or cDNA is obtained[;] rather 
any or all ESTs rknown or unknown) may be placed on the support. Hybridization 
will be used ftol form diagnostic pattern .s or to identify which particular EST is 
detected. For example, all known ESTs from an organism are used to produce a 
'master' solid support to which control sample and disease samples are altemately 
hybridized, [page 14] 

Diagnosis is accomplished by comparing the two hybridization pattems . 
wherein substantial differences between the first and second hybridization pattems 
indicate the presence of the selected disease or infection in the animal being tested. 
Substantially similar first and second hybridization pattems indicate the absence of 
disease or infection. This[,] like many of the foregoing embodiments[,] may use 
known or u nknown ESTs derived from many libraries, [page 18] 

Still another intriguing use of this method is in the area of monitoring the 
effects of drugs on gen e expression , both in laboratories and during clinical trials 
with animal[s], especially humans, [page 18] 

WO 95/20681 ("Comparative Gene Transcript Analysis"), filed in 1994 by 
Appellants' assignee and published August 3, 1995, has three issued U.S. counterparts: 
U.S. Pat. Nos. 5,840,484, issued November 24, 1998; 6,1 14,1 14, issued September 5, 2000; and 
6,303,297, issued October 16, 2001. 

The specification describes the use of transcript expression pattems, or "images", 
each comprising multiple pixels of gene-specific information, for diagnosis, for cellular 
phenotyping, and in toxicology and drug development efforts. The specification describes a 
plurality of methods for obtaining the requisite expression data - one of which is microarray 
hybridization - and equates the uses of the expression data from these disparate platforms, hi 
particular, and with emphasis added: 



117940 



15 



09/895,686 



D cket No.: PC-0044 CIP 



The invention provides a "method and system for quantifying the relative 

abundance of gene transcripts in a biological specimen [G]ene transcript 

imaging can be used to detect or diagnose a particular biological state, disease, or 
condition which is correlated to the relative abundance of gene transcripts in a 
given cell or population of cells. The invention provides a method for comparing 
the gene transcript image analvsis from two or more different biological specimens 
in order to distinguish between the two specimens and identify one or more genes 
which are differentially expressed between the two specimens." [abstract] 

"rW1e see e ach individual gene product as a 'pixeV of information which 
relates to t he expression of that, and onlv that, gene . We teach herem [] methods 
whereby the individual 'pixels' of gene expression information can be combined 
into a sing le gene transcript 'image. 'in which each of the individual genes can be 
visualized simultaneously and allowing relationships between the gene pixels to be 
easily visualized and understood." [page 2] 

"The present invention avoids the drawbacks of the prior art by providing a 
method to quantifv the relative abundance of multiple gene transcripts in a given 

biological specimen The method of the instant invention provides for detailed 

diagnostic comparisons of cell profiles revealing numerous changes in the 
expression of individual transcripts." [page 6] 

"High resolution analysis of gene expression be used directiv as a diagnostic 
profile . ..." [page 7] 

"The method is particularly powerful when more than 100 and preferably 
more than 1,000 gene transcripts are analyzed." [page 7] 

"The invention . . . includes a method of comparing specimens containing 
gene transcripts." [page 7] 

"The final data values from the first specimen and the further identified 
sequence values from the second specimen are processed to generate ratios of 
transcript sequences, which indicate the differences in the number of gene 
transcripts between the two specimens." [i.e., the results yield analogous data to 
microarrays] [page 8] 

"Also disclosed is a method of producing a gene transcript image analysis 
by first obtaining a mixture of mRNA, from which cDNA copies are made." [page 

"In a further embodiment, the relative abundance of the gene transcripts in 
one cell type or tissue is compared with the relative abundance of gene transcript 
numbers in a second cell type or tissue in order to identify the differences and 
similarities." [page 9] 
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"In essence, the invention is a method and system for quantifying the 
relative abundance of gene transcripts in a biological specimen. The invention 
provides a method for comparing the gene transcript image from two or more 
different biological specimens in order to distinguish between the two specimens 
•"[page 9] 

"[T]wo or more gene transcript images can be compared and used to detect 
or diagnose a particular biological state, disease, or condition which is correlated to 
the relative abundance of gene transcripts in a given cell or population of cells " 
[pages 9-10] 

"The present invention provides a method to compare the relative 

abundance of gene transcripts m different biological specimens This process is 

denoted herein as gene transcript imaging. The quantitative analysis of the relative 
abundance for a set of gene transcripts is denoted herein as 'gene transcript image 
analysis' or 'gene transcript frequency analysis'. The present invention allows one 
to obtain a profile for gene transcription in any given population of cells or tissue 
from anv tvpe of organism ." [page 11] 

"The invention has significant advantages in the fields of diagnostics. 
toxicology and pharmacnlnpy^ tr> name a few." [page 12] 

"[G]ene transcript sequence abundances are compared against reference 
database sequence abundances including normal data sets for diseased and healthy 
patients. The patient has the disease(s) with which the patient's data set most 
closely correlates." [page 12] 

"For example, gene transcript frequency analysis can be used to differentiate 
normal cells or tissues from diseased cells or tissues " [page 12] 

"hi toxicology, . . . [g]ene transcript imaging provides highly detailed 
information on the cell and tissue environment, some of which would not be 
obvious in conventional, less detailed screening methods. The gene transcrip t 
image is a more power ful method to predict drug toxicity and efficacy . Similar 
benefits accrue in the use of this tool in pharmacology " [page 12] 

"In an alte rnative embodiment , comparative gene transcript frequency 
analysis is used to diff erentiate between cancer cells which respond to anti-cancer 
agents and those which do not respond." [page 12] 

"In a further embodiment, comparative gene transcript frequency analysis is 
used ... for the selection of better pharmacologic animal models." [page 14] 

"In a further embodiment, comparative gene transcript frequency analysis is 
used in a clinical setting to give a highly detailed pene transcript profile of a 
diseased state or condition." [page 14] 
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"An alternate m ethod of producing a gene transcript image includes the 
steps of obtaining a mixture of test mRNA and providing a representative array of 
unique probes whose sequences are complementary to at least some of the test 
mRNAs. Nex t, a fixed amount of the test mRNA is added to the arraved probes. 
The test mRN A is incubated with the probes for a sufficient time to allow hybrids 
of the test mR NA and probes to form. The mRNA-probe hybrids are detected and 
the quantity determined ." [page 15] 

"[Tlhis res earch tool provides a way to get new drugs to the public faster 
and more economically. " [page 36] 

"hi this me thod, the particular physiologic function of the protein transcript 
need not be determine d to qualify the gene transcript as a clinical marker." [page 
38] 

"[T]he gene transcript changes noted in the earher rat toxicity study are 
carefully evaluated as clinical markers m the followed patients. Changes in the 
gene transcript image analyses are evaluated as indicators of toxicity by correlation 

with clinical signs and symptoms and other laboratory results The . . . analysis 

highlights any toxicological changes in the treated patients." [page 39] 

U.S. Pat No . 5,569.588 ("Methods for Drug Screening") ("the '588 patent"), 
issued October 29, 1996, with a priority date of August 1995, describes an expression profiling 
platform, the "genome reporter matrix", which is different from nucleic acid microarrays. 
Additionally describing use of nucleic acid microarrays, the 588 patent makes clear that the 
utility of comparing multidimensional expression datasets is independent of the methods by 
which such profiles are obtained. The 588 patent speaks clearly to the usefuhiess of such 
expression analyses in drug development and toxicology, particularly pointing out that a gene's 
failure to change in expression level is a useful result. Thus, with emphasis added. 

The invention provides "[m]ethods and compositions for modeling the 

transcriptional responsiveness of an organism to a candidate drug [The final 

step of the method comprises] comparing reporter gene product signals for each cell 
before and after contacting the cell with the candidate drug to obtain a drug 
response profile which provides a model of the transcriptional responsiveness of 
said organism to the candidate drug. " [abstract] 

"The present invention exploits the recent advances in genome science to 
provide for the rapid screening of large numbers of compounds against a systemic 
target comprising substantially all targets in a pathway Fori organism ." [col. 1] 
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"The ensemble of reporting cells comprises as comprehensive a collection 
of transcription regulatory genetic elements as is conveniently available for the 
targeted organism so as to most accurately model the systemic transcriptional 
response. Suitable ensembles gen erally comprise thousands of individually 
reporting elements: pre ferred ensembles are substantially comprehensive, i.e. 
provide a transcriptional response di versity comparable to that of the tarpet 
organism. Generally, a substantially c o mprehensive ensemble requires transcription 
regulatory genetic elements from at le ast a majority of the organi.sm's genes, and 
preferably includes tho.se of all or nearly all of the eenea We term such a 
substantially comprehensive ensemble a genome reporter matrix. " [col. 2] 

"Drugs often have side effects that are in part due to the lack of target 

specificity [A] genome reporter matrix reveals the spectrum of other genes in 

the genome also affected by the compound. In considering two different 
compounds both of which induce the ERGIO reporter, if one compound affects the 
expression of 5 other reporters and a second compound affects the expression of 50 
other reports, the first compound is, a priori, more likely to have fewer side 
effects." [cols. 2-3] 

"Furthermore, it is not necessary to know the identity of any of the 
responding genes ." [col. 3] 

"[A]ny new compound that induces the same response profile as [a] . . . 
dominant tubulin mutant would provide a candidate for a taxol-like 
pharmaceutical." [col. 4] 

"The genome reporter matrix offers a simple solution to recognizing new 
specificities in combinatorial libraries. Specifically, pools of new compounds are 
tested as mixtures across the matrix. If the pool has any new activity not present in 
the original lead compound, new genes are affected among the reporters. " [col. 4] 

" A sufficient mmiber of different recombinant cells are included to provide 
an ensemble of transcriptional regulatory elements of said organism sufficient to 
model the transcriptional responsiveness of said organism to a drug. In a preferred 
embodiment, the matrix is substantially comprehensive for the selected regulatory 
elements, e.g. essentially all of the gene promoters of the targeted organism are 
included." [cols. 6 -7] ' 

"In a preferred embodiment, the basal response profiles are determined 

The resultant electrical output signals are stored in a computer memory as genome 
reporter output signal matrix data structure associating each output signal with the 
coordinates of the corresponding microtiter plate well and the stimulus or drug. 
This information is indexed against the matrix to form reference response profiles 
that are used to determine the response of each reporter to any miUeu in which a 
stimulus may be provided. After establishing a basal response profile for the 
matrix, each cell is contacted with a candidate drug. The term dmg is used loosely 
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to refer to agents which can provoke a specific cellular response The drug 

induces a complex resp onse pattern of repression, silence and induction across the 

matrix The response profile reflects the cell's transcriptional adjustments to 

maintain homeostasis in the presence of the drug After contacting the cells with 

the candidate drug, the reporter gene product signals from each of said cells is again 
measured to determine a stimulated response profile. The basal o[r] background 
response profile is then compared with ... the stimulated response profile to 
identify the cellular response profile to the candidate drug." [cols. 7 - 8] 

" In another embodiment of the invention , a matrix Ti.e.. arravl of 
hybridization probes corr esponding to a predetermined population of genes of the 
selected organism is used to specifically detect changes in gene transcription which 
result from exposing the selected organism or cells thereof to a candidate drug. In 
this embodiment, one or more cells derived from the organism is exposed to the 
candidate drug in vivo or ex vivo under conditions wherein the drug effects a 
change in gene transcription in the cell to maintain homeostasis. Thereafter, the 
gene transcripts, primarily mRNA, of the cell or cells is isolated . . . [and] then 
contacted with an ordered matrix [array] of hybridization probes, each probe being 
specific for a different one of the transcripts, under conditions where each of the 
transcripts hvbridizes with a corresponding one of the probes to form hybridization 
pairs. The ordered matrix of probes provides, in aggregate, complements for an 
ensemble of genes of the organism sufficient to model the transcriptional 

responsiveness of the organism to a dmg The matrix-wide signal profile of the 

drug-stimulated cells is th en compared with a matrix- wide signal profile of negative 
control cells to obtain a specific drug response profile." [col. 8] 

"The invention also provides means for computer-based qualitative analysis 
of candidate drugs and unknown compounds. A wide variety of reference response 
profiles may be generated and used in such analyses." [col. 8] 

"Response profil es for an unknown stimulus fe.g. new chemicals, unknown 
compounds or unknow n mixmrest mav be analyzed bv comparing the new stimulus 
response profiles with r esponse profiles to known chemical stimuli ." [col. 9] 

"The response profile of a new chemical stimulus may also be compared to 
a known genetic response profile for target gene(s)." [col. 9] 

The August 1 1, 1997 press release from' the '588 patent's assignee, Acacia Biosciences 
(now part of Merck) (reference "9h" attached hereto), and the September 15, 1997 news report by 
Glaser, "Strategies for Target Validation Streamline Evaluation of Leads," Genetic Engineering 
News (reference "9i" attached hereto), attest the commercial value of the methods and technology 
described and claimed in the '588 patent. 
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WO 91lli%ll ("Measurement of Gene Expression Profiles in Toxicity 
Determinations"), published April 17, 1997, describes an expression profiling technology 
differing somewhat from the use of cDNA microarrays and differing from the genome reporter 
matrix of the '588 patent; but the use of the data is analogous. As per its title, the reference 
describes use of expression profiling in toxicity determinations. In particular, and with emphasis 
added: 

"[T]he invention relates to a method for detecting and monitoring changes 
in gene expression patterns in in vitro and in vivo systems for determining the 
toxicity of drug candidates." [Field of the invention] 

"An object of the invention is to provide a new approach to toxicitv 
assessmen t based on an examination of gene expression patterns, or profiles , in in 
vitro or in vivo test systems." [page 3] 

"Another object of the invention is to provide a rapid and reliable method 
for correlating gene expression with short term and long term toxicity in test 
animals." [page 3] 

"The invention achieves these and other objects by providing a method for 
massively parallel signature sequencing of genes expressed in one or more selected 
tissues of an organism exposed to a test compound. An important feature of the 
mvention is the application of novel . . , methodologies that permit the formation of 

gene expression profiles for selected tissues Such profiles may be compared 

with those from tissues of control organisms at single or multiple time points to 
identify expression patterns predictive of toxicitv ." [page 3] 

"As used herein, the terms 'gene expression profile,' and 'gene expression 
pattern' which is used equivalently, means a frequency distribution of sequences of 
portions of cDNA molecules sampled from a population of tag-cDNA conjugates. . 
.. Preferably, the total number of sequences determined is at least 1000; more 
preferably, the total number of sequences determined m a gene expression profile is 
at least ten thousand /' [page 7] 

"The invention provides a method for determining the toxicity of a 
compound by analyzing changes in the gene expression profiles in selected tissues 

of test organisms exposed to the compound Gene expression profiles derived 

ft-om test organisms are compared to gene expression profiles derived from control 
organisms. ..." [page 7] 

Therefore, the potential benefit to the public, in terms of lives saved and reduced health 
care costs, are enormous. Evidence of the benefits of this information include: 
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In 1999, CV Therapeutics, an Incyte collaborator, was able to use Incyte gene 
expression technology, information about the structure of a known transporter 
gene, and chromosomal mapping location, to identify the key gene associated 
with Tangiers disease. This discovery took place over a matter of only a few 
weeks, due to the power of these new genomics technologies. The discovery 
received an award from the American Heart Association as one of the top 10 
discoveries associated with heart disease research in 1999. 

In an April 9, 2000, article pubUshed by the Bloomberg news service, an Incyte 
customer stated that it had reduced the time associated with target discovery and 
validation from 36 months to 18 months, through use of Incyte 's genomic 
information database. Other Incyte customers have privately reported similar 
experiences. The implications of this significant saving of time and expense for 
the number of drugs that may be developed and their cost are obvious. 

In a February 10, 2000, article in the Wall Street Journal, one Incyte customer 
stated that over 50 percent of the drug targets in its current pipelme were derived 
from the Incyte database. Other Incyte customers have privately reported similar 
experiences. By doubling the number of targets available to pharmaceutical 
researchers, Incyte genomic information has demonstrably accelerated the 
development of new drugs. 

Because the Patent Examiner failed to address or consider the "well-established" utilities 
for the claimed invention in toxicology testing, drug development, and the diagnosis of disease, 
the Examiner's rejections should be overturned regardless of their merit. 

C. The uncontested fact that the claimed polynucleotide encodes a protein in the 
GPCR family also demonstrates utility 

In addition to having substantial, specific and credible utilities in numerous gene 
expression monitoring applications, it is undisputed that the claimed polynucleotide encodes for 
a protein having the sequence shown as SEQ ID NO: 1 in the patent apphcation. Appellants have 
demonstrated that SEQ ID NO: 1 is a member of the GPCR family, and that the GPCR family of 
proteins includes glutamate GPCRs that function in neurotransmission, and play a role m certain 
neurological diorders. 

The Patent Examiner does not dispute any of the facts set forth in the previous paragraph. 
Neither does the Patent Examiner dispute that, if a polynucleotide encodes for a protein that has a 
substantial, specific and credible utihty, then it follows that the polynucleotide also has a 
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substantial, specific and credible utility. 

The Examiner must accept the applicant's demonstration that the polypeptide encoded by 
the claimed invention is a member of the GPCR family and that utility is proven by a reasonable 
probabihty unless the Examiner can demonstrate through evidence or sound scientific reasoning 
that a person of ordinary skill iu the art would doubt utihty. See In re hanger, 503 F.2d 1 380, 
1391-92, 183 USPQ 288 (CCPA 1974). The Examiner has not provided sufficient evidence or 
sotmd scientific reasoning to the contrary. 

Nor has die Examiner provided any evidence that any member of the GPCR family, let 
alone a substantial number of those members, is not useful. In such circumstances, the only 
reasonable inference is that the polypeptide encoded by the claimed invention must be, like the 
other members of the GPCR family, useful. 

D. Objective evidence corroborates the utilities of the claimed invention 

There is, in fact, no restriction on the kinds of evidence a Patent Examiner may consider 
in determining whether a "real-world" utility exists. "Real-world" evidence, such as evidence 
showing actual use or commercial success of the invention, can demonstrate conclusive proof of 
utihty. Raytheon v. Roper, 220 USPQ2d 592 (Fed. Cir. 1983); Nestle v. Eugene, 55 F.2d 854, 
856, 12 USPQ 335 (6th Cir. 1932). Indeed, proof that the invention is made, used or sold by any 
person or entity other than the patentee is conclusive proof of utility. United States Steel Corp. 
V. Phillips Petroleum Co., 865 F.2d 1247, 1252, 9 USPQ2d 1461 (Fed. Cir. 1989). 

Over the past several years, a vibrant market has developed for databases contaming the 
sequences of all expressed genes (along with the polypeptide translations of those genes), in 
particular genes having medical and pharmaceutical significance such as the instant sequence. 
(Note that the value in these databases is enhanced by their completeness, but each sequence m 
them is mdependentiy valuable.) The databases sold by Appellants' assignee, Incyte, include 
exactly the kinds of information made possible by the claimed invention, such as tissue and 
disease associations, hicyte sells its database contaming the claimed sequence and millions of 
other sequences throughout the scientific community, including to pharmaceutical companies 
who use the information to develop new pharmaceuticals. 

Both Incyte' s customers and the scientific community have acknowledged that Incyte' s 
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databases have proven to be valuable in, for example, the identification and development of drug 
candidates. Page et al., in discussing the identification and assignment of candidate drug targets, 
state that "rapid identification and assignment of candidate targets and markers represents a huge 
challenge ... [t]he process of annotation is similarly aided by the quantity and richness of the 
sequence specific databases that are currently available, both in the public domain and in the 
private sector (e.g. those supplied by Incyte Pharmaceuticals)" Page, M.J. et al., "Proteomics: a 
major new technology for the drug discovery process," Drug Discov. Today 4:55-62 (1999) 
(Reference No. 6), see page 58, col. 2). As hicyte adds information to its databases, including 
the information that can be generated only as a result of hicyte 's invention of the clauned 
polynucleotide and its use of that polynucleotide on cDNA microairays, the databases become 
even more powerful tools. Thus the claimed invention adds more than incremental benefit to the 
drug discovery and development process. 

III. The Patent Examiner's rejections are without merit 

Rather than responding to the evidence demonstrating utility, the Examiner attempts to 
dismiss it altogether by arguing that the disclosed and well-established utilities for the claimed 
polynucleotide are not "specific, substantial, and credible" utilities. (Final Office Action at page 
3). The Examiner is incorrect both as a matter of law and as a matter of fact. 

A. The precise biological role or function of an expressed polynucleotide is not 
required to demonstrate utility 

The Patent Examiner's primary rejection of the claimed invention is based on the ground 
that, without mformation as to the precise "biological role" of the claimed invention, the claimed 
invention's utility is not sufficiently specific. According to the Examiner, it is not enough that a 
person of ordinary skill in the art could use and, in fact, would want to use the claimed invention 
either by itself or in a cDNA microarray to monitor the expression of genes for such apphcations 
as the evaluation of a drug's efficacy and toxicity. The Examiner would require, in addition, that 
the applicant provide a specific and substantial interpretation of the results generated in any 
given expression analysis. 

It may be that specific and substantial interpretations and detailed information on 
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biological function are necessary to satisfy the requirements for publication in some technical 
journals, but they are not necessary to satisfy the requirements for obtaining a United States 
patent. The relevant question is not, as the Examiner would have it, whether it is known how or 
why the invention works. In re Cortwright, 165 F.3d 1353, 1359 (Fed. Cir. 1999), but rather 
whether the invention provides an "identifiable benefit" in presently available form Juicy Whip 
Inc. V. Orange Bang Inc., 185 F.3d 1364, 1366 (Fed. Cir. 1999). If the benefit exists, and there is 
a substantial likelihood the invention provides the benefit, it is useful. There can be no doubt, 
particularly in view of the First BediUon Declaration (at, e.g., ff 10 and 15), that the present 
invention meets this test 

The threshold for determining whether an mvention produces an identifiable benefit is 
low. Juicy Whip, 185 F.3d at 1366. Only those utilities that are so nebulous that a person of 
ordinary skill in the art would not know how to achieve an identifiable benefit and, at least 
according to the PTO guidelines, so-called "throwaway" utilities that are not directed to a person 
of ordinary skill in the art at all, do not meet the statutory requirement of utility. Utility 
Examination Guidelines, 66 Fed. Reg. 1092 (Jan. 5, 2001). 

Knowledge of the biological function or role of a biological molecule has never been 
required to show real- world benefit. In its most recent explanation of its own utility guidelines, 
the PTO acknowledged as much (66 F.R. at 1095): 

[T]he utility of a claimed DNA does not necessarily depend on the function of the 
encoded gene product. A claimed DNA may have specific and substantial utility 
because, e.g. , it hybridizes near a disease- associated gene or it has gene-regulating 
activity. 

By implicitly requiring knowledge of biological function for any claimed nucleic acid, 
the Examiner has, contrary to law, elevated what is at most an evidentiary factor into an absolute 
requirement of utility. Rather than looking to the biological role or function of the claimed 
invention, the Examiner should have looked first to the benefits it is alleged to provide. 

B. Membership in a class of useful products can be proof of utility 

Despite the uncontradicted evidence that the claimed polynucleotide encodes a 
polypeptide m the GPCR family, the Examiner refused to impute the utility of the members of 
the GPCR family to SEQ ID NO: 1 . hi the Final Office Action, the Patent Examiner takes the 
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position that, unless Appellants can identify which particular biological function within the class 
of GPCRs is possessed by SEQ ID NO: 1, utility cannot be imputed. See Final Office Action, 
page 4. To demonstrate utility by membership in the class of GPCRs, the Examiner would 
require that all GPCRs possess a "common" utility. 

There is no such requirement in the law. In order to demonstrate utility by membership 
in a class, the law requires only that the class not contain a substantial number of useless 
members. So long as the class does not contain a substantial number of useless members, there 
is sufficient likelihood that the claimed invention will have utility, and a rejection under 
35 U.S.C. § 101 is improper. That is tme regardless of how the claimed invention ultimately is 
used and whether or not the members of the class possess one utility or many. See Brenner v. 
Manson, 383 U.S. 519, 532 (1966); Application of Kirk, 376 F.2d 936, 943 (CCPA 1967). 

Membership in a "general" class is insufficient to demonstrate utihty only if the class 
contains a sufficient number of useless members such that a person of ordinary skill in the art 
could not impute utility by a substantial likelihood. There would be, in that case, a substantial 
likelihood that the claimed invention is one of the useless members of the class. In the few cases 
in which class membership did not prove utility by substantial likelihood, the classes did in fact 
include predommately useless members. E.g., Brenner (man-made steroids); Kirk (same); Nam 
(man-made polyethylene polymers). 

The Examiner addresses GPCRs as if the general class in which it is included is not the 
GPCR family, but rather all polynucleotides or all polypeptides, including the vast majority of 
useless theoretical molecules not occurring in nature, and thus not pre-selected by nature to be 
useful. While these "general classes" may contain a substantial number of useless members, the 
GPCR family does not. The GPCR family is sufficiently specific to rule out any reasonable 
possibility that SEQ ID NO: 1 would not also be useful like the other members of the family. 

Because the Examiner has not presented any evidence that the GPCR class of signaling 
molecules has any, let alone a substantial number, of useless members, the Examiner must 
conclude that there is a "substantial likelihood" that the SEQ ID NO: 1 encoded by the claimed 
polynucleotide is useful. It follows that the claimed polynucleotide also is useful. 

C. Because the uses of the claimed polynucleotide in toxicology testing, drug 

discovery, and disease diagnosis are practical uses beyond mere study of the 
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invention itself, the claimed invention has substantial utility 

The Examiner's rejection of the claims at issue as not having a "substantial" use is 
tantamount to a rejection based on an allegation that the only use of the claimed invention is as a 
tool for further research. Because the PTO' s rejection assumes a substantial overstatement of 
the law, and is incorrect in fact, it must be overturned. 

There is no authority for the proposition that use as a tool for research is not a substantial 

utility. Indeed, the Patent Office has recognized that just because an invention is used in a 

research setting does not mean that it lacks utiUty (Section § 2107.01 of the Manual of Patent 

Examining Procedure, 8"^ Edition, August 2001, under the heading I. Specific and Substantial 

Reqiiirements, Research Tools): 

Many research tools such as gas chromatographs, screening assays, and nucleotide 
sequencing techniques have a clear, specific and unquestionable utiUty (e.g., they are 
useful in analyzing compounds). An assessment that focuses on whether an invention is 
* useful only in a research setting thus does not address whether the specific invention is in 
fact "useful" in a patent sense. Instead, Office personnel must distinguish between 
inventions that have a specifically identified utility and inventions whose specific utility 
requires further research to identify or reasonably confirm. 

The Patent Office's actual practice has been, at least until the present, consistent with that 
approach. It has routinely issued patents for inventions whose only use is to facilitate research, 
such as DNA ligases. These are acknowledged by the PTO's Training Materials themselves to 
be useful, as well as DNA sequences used, for example, as markers. 

Only a limited subset of research uses are not "substantial" utilities: those in which the 
only known use for the claimed invention is to be an object of further study, thus merely inviting 
further research. This follows from Brenner, in which the U.S. Supreme Court held that a 
process for making a compound does not confer a substantial benefit where the onlv known use 
of the compound was to be the object of further research to determine its use. Id at 535. 
Similarly, in Kirk, the Court held that a compound would not confer substantial benefit on the 
public merely because it might be used to synthesize some other, unknown compound that would 
confer substantial benefit. Kirk, 376 F.2d at 940, 945 ("What appellants are really saying to 
those in the art is take these steroids, experiment, and find what use they do have as medicines."). 
Nowhere do those cases state or imply, however, that a material cannot be patentable if it has 
some other beneficial use in research. 
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D. The Patent Examiner failed to demonstrate that a person of ordinary skill in 
the art would reasonably doubt the utility of the claimed invention 

The Examiner alleges that applicants asserted use of the claimed polynucleotide in the 

detection and diagnosis of cancer, in particular, thyroid cancer, is based on a correlation with 

thyroid cancer in on a single library representing follicular carcinoma of the thyroid. See 

specification, at page 35. Applicants reiterate diat the asserted utility for the polynucleotide 

encoding SEQ ID NO: 1 in the detection and diagnosis of follicular carcinoma of the thyroid, 

based on a significant (4-fold) differential expression in diat disease condition, is both specific, 

substantial, and credible. The Examiners' allegation that the asserted utihty is not credible 

because it is based on expression of the transcript in only one library ignores the fact that a 

number of thyroid libraries were examined representing both normal and diseased thyroid, and 

that only libraries associated widi thyroid cancer were found to express the gene. In particular, 

the gene was most highly expressed in a thyroid follicular carcinoma tumor library 

(THYRTUP02), but was also expressed in a library associated with follicular adenoma 

(THYRNOT03), a precancerous condition to follicular carcinoma. Such evidence provides more 

than a "substantial likelihood" that the polynucleotide may be used in the detection and diagnosis 

of the disease. Further, the evidence provided from the Northern analysis for SEQ ID N0:7 

supports applicants assertion or the use of the claimed polynucleotide in cancer as disclosed in 

the Bandman '513 priority application at pages 29-30. The Examiners' reliance on references 

such as the NCI Guidelmes for Marker Development to support her position is merely an attempt 

to raise the standard for utility to one of near certainty. However, the standard applicable in this 

case is not proof to certainty, but rather proof to reasonable probability. Brenner, 383 U.S. at 

532. 

Applicants' Showing of Facts Overcomes The Examiner's Concern That 
Applicants' Invention Lacks ' 'Specific Utility ' ' 

The Examiner alleges that the claimed invention is not supported by either a specific and 
substantial asserted utility or a well established utility. (Final Office Action, page 3.) 

Appellants' submission of additional facts overcomes this concern. Those facts 
demonstrate that, far from applying regardless of the specific properties of the claimed 
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invention, the utility of Appellants' claimed polynucleotides as gene-specific probes depends 
upon specific properties of the polynucleotides, that is, their nucleic acid sequences. 

"[E]ach probe on ... [a "high density spotted microaiTay[]"], with careful design and 
sufficient length, and with sufficientiy stringent hybridization and wash conditions, binds 
specifically and with minimal cross-hybridization, to the probe's cognate transcript" "[e]ach 
gene included as a probe on a microarray provides a signal that is specific to the cognate 
transcript, at least to a first approximation." ^ Accordingly, "each additional probe makes an 
additional transcript newly detectable by the microarray, increasing the detection range, and thus 
versatihty, of this analytical device for gene expression profiling" ^ equally, "[e]ach new gene- 
specific probe added to a microarray tiius increases the number of genes detectable by the device, 
increasing the resolving power of the device." * 

Although not required for present purposes, it would be appropriate to state on the record 
here that the specificity of nucleic acid hybridization was well-established far earlier than the 
development of high density spotted microarrays in 1995, and indeed is the well-established 
underpinning of many, perhaps most, molecular biological techniques developed over the past 30 
- 40 years. 

IV. By requiring the patent applicant to assert a particular or unique utility, the Patent 
Examination Utility Guidelines and Training Materials applied by the Patent 
Examiner misstate the law 

There is an additional, independent reason to overturn the rejections: to the extent the 
rejections are based on Revised Interim Utility Examination Guidelines (64 FR 71427, 
December 21, 1999), the final Utility Examination Guidelines (66 FR 1092, January 5, 2001) 
and/or the Revised Interim Utihty Guidelines Training Materials (USPTO Website 
www.uspto.gov, March 1, 2000), the Guidelines and Training Materials are themselves 
inconsistent with the law. 



Declaration of Dr. John C.Rockett, U 10(i), emphasis added. 

Declaration of Dr. Vishwanath R. Iyer, H 7 (emphasis added). See the footnote at If 7 for a slighUy more 
"nuanced" view. 

^ Declaration of Dr. John C. Rockett, U lO(ii). 
* Declaration of Dr. Vishwanath R. Iyer, H 7. 



117940 



29 



09/895,686 



Docket No.: PC-0044 CIP 

The Training Materials, which direct the Examiners regarding how to apply the Utility 
Guidelines, address the issue of specificity with reference to two lands of asserted utilities: 
"specific" utihties which meet the statutory requirements, and "general" utiUties which do not. 
The Training Materials define a "specific utihty" as follows: 

A [specific utility] is specific to the subject matter claimed. This contrasts to general 
utility that would be apphcable to the broad class of invention. For example, a claim to a 
polynucleotide whose use is disclosed simply as "gene probe" or "chromosome marker" 
would not be considered to be specific in the absence of a disclosure of a specific DNA 
target. Similarly, a general statement of diagnostic utility, such as diagnosing an 
unspecified disease, would ordinarily be insufficient absent a disclosure of what condition 
can be diagnosed. 

The Training Materials distinguish between "specific" and "general" utihties by assessing 
whether the asserted utility is sufficiently "particular," Le, , unique (Training Materials at page 
52) as compared to the "broad class of invention." (hi this regard, the Training Materials appear 
to parallel the view set forth in Stephen G. Kunin, Written Description Guidelines and Utility 
Guidelines, 82 J.P.T.O.S. 77, 97 (Feb. 2000) ("With regard to the issue of specific utility the 
question to ask is whether or not a utility set forth in the specification is particular to the claimed 
invention.")). 

Such "unique" or "particular" utilities never have been required by the law. To meet the 
utility requirement, the invention need only be "practically useful," Natta, 480 F.2d 1 at 1397, 
and confer a "specific benefit" on the pubUc. Brenner, 383 U.S. at 534. Thus, incredible "throw- 
away" utilities, such as trying to "patent a transgenic mouse by saying it makes great snake 
food," do not meet this standard. Karen Hall, Genomic Warfare . The American Lawyer 68 (June 
2000) (quoting John Doll, (Zhief of the Biotech Section of USPTO). 

This does not preclude, however, a general utility, contrary to the statement in the 
Training Materials where "specific utiUty" is defined (page 5). Practical real-world uses are not 
limited to uses that are unique to an invention. The law requires that the practical utility be 
"definite," not particular. Montedison, 664 F.2d at 375. Appellants are not aware of any court 
that has rejected an assertion of utility on the grounds that it is not "particular" or "unique" to the 
specific invention. Where courts have found utility to be too "general," it has been in those cases 
in which the asserted utihty in the patent disclosure was not a practical use that conferred a 
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specific benefit. That is, a person of ordinary skill in the art would have been left to guess as to 
how to benefit at all from the invention. In Kirk, for example, the CCPA held the assertion that a 
man-made steroid had "useful biological activity" was insufficient where there was no informa- 
tion in the specification as to how that biological activity could be practically used. Kirk, 376 
F.2dat94L 

The fact that an invention can have a particular use does not provide a basis for requiring 
a particular use. See Brana, supra (disclosure describing a claimed antitumor compound as 
being homologous to an antitumor compound having activity against a "particular" type of 
cancer was determined to satisfy the specificity requirement). "Particularity" is not and never has 
been the sine qua non of utility; it is, at most, one of many factors to be considered. 

As described supra, broad classes of inventions can satisfy the utility requirement so long 
as a person of ordinary skill in the art would understand how to achieve a practical benefit from 
knowledge of the class. Only classes that encompass a significant portion of nonuseful members 
would fail to meet the utihty requirement Montedison, 664 F.2d at 374-75. 

The Training Materials fail to distinguish between broad classes that convey information 
of practical utility and those that do not, lumping all of them into the latter, unpatentable 
category, of "general" utilities. As a result, the Training Materials paint with too broad a brush. 
Rigorously applied, they would render unpatentable whole categories of inventions that 
heretofore have been considered to be patentable and that have indisputably benefitted the public, 
including the claimed mvention. See supra § II.B. Thus the Training Materials cannot be 
applied consistently with the law. 

V. To the extent the rejection of the claimed invention under 35 U.S.C. § 112, first 
paragraph, is based on the improper rejection for lack of utility under 35 U.S.C. 
§ 101, it must be reversed. 

The rejection set forth in the Office Action is based on the assertions discussed above, 
i.e., that the claimed invention lacks patentable utility. To the extent that the rejection under 35 
U.S.C. § 1 12, fust paragraph, is based on the improper allegation of lack of patentable utihty 
under 35 U.S.C. § 101, it fails for the same reasons. 
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CONCLUSION 

Appellants respectfully submit that rejections for lack of utility based, inter alia, on an 
allegation of , "lack of specificity," as set forth in the Office Action and as justified in the Revised 
Interim and fmal UtiUty Guidelines and Training Materials, are not supported in the law. Neither 
are they scientificaUy correct, nor supported by any evidence or sound scientific reasoning. 
These rejections are alleged to be founded on facts in court cases such as Brenner and Kirk, yet 
those facts are clearly distinguishable from the facts of the instant application, and indeed most if 
not aU nucleotide and protein sequence appUcations. Nevertheless, the PTO is attempting to 
mold the facts and holdings of these prior cases, "like a nose of wax," ' to target rejections of 
claims to polypeptide and polynucleotide sequences, where biological activity information has 
not been proven by laboratory experimentation, and they have done so by ignoring perfectly 
acceptable utilities fully disclosed in the specifications as well as well-estabUshed utilities known 
to those of skill in the art. As is disclosed in the specification, and even more clearly, as one of 
ordinary skill in the art would understand, the claimed invention has well-established, specific, 
substantial and credible utilities. The rejections are, therefore, improper and should be reversed. 

Moreover, to the extent the above rejections were based on the Revised hiterim and final 
Examination Guidelines and Training Materials, those portions of the Guidelines and Training 
Materials that form the basis for the rejections should be determined to be inconsistent with the 
law. 

Claims .1-6 stand rejected under 35 U.S.C. § 1 12, first paragraph, as containing subject 
matter which is not described in the specification in such a was as to reasonably convey to one 
skilled in the relevant art that the inventor(s), at the time the application was filed, had 
possession of the claimed invention. The rejection alleges in particular, that: 

while the specification describes a polypeptide sequence consistmg of SEQ ID NO: 1 , the 
claims encompass polypeptides comprising fragments and homologues that vary 



'The concept of patentable subject matter under § 101 is not 'like a nose of wax which 
may be turned and twisted in any direction * * *. ' White v. Dunbar, 1 19 U S 47 51 " (Parker v 
Flook, 198 USPQ 193 (US SupCt 1978)) 
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substantially in length and also in aniino acid composition. The instant disclosure of a 
single polypeptide, that of SEQ ID NO: 1 , does not support the scope of the claimed 
genus, which encompasses a substantial variety of subgenera. See Reagents of the 
University of California v Eli Lilly with respect to the premise that "A description of a 
genus of cDNAs may be achieved by means of a recitation of a representative number of 
cDNAs, defined by nucleotide sequence, falling within the scope of the genus, or a 
recitation of structural features common to the genus, which features constitute a 
substantial portion of the genus". The Examiner then cited various references alleging to 
support the unpredictabihty of protein function based on sequence homology. See, in 
particular, Vukicevicetal.; Tischeretal.; and Kopchicketal. The Examiner concluded 
by saying that given the unpredictability of homology comparisons, and the fact that the 
specification fails to provide objective evidence that the additional sequences are indeed 
species of the claimed genus, it cannot be established that a representative number of 
species have been disclosed by the claims. Further, the Examiner stated, no activity is set 
forth for the additional sequences. 

The redted fragments and variants of SEQ ID NO:l and SEQ ID NO:2 are 
sufficiently described in chemical and structural terms that the skilled artisan would 
recognize applicant's possession of ttiem at the time the application was filed 

With respect to fragments of SEQ ID NO: 1, as recited in claim 1, applicants submit that 
the recited fragments are disclosed in the specification and claims in terms of their specific 
amino acid sequences and therefore clearly meet the requirements for written description under 
35 U.S.C. § 1 12, first paragraph.. 

The claimed "homologues" of SEQ ID NO: 1 referred to by the Examiner presumably 
relate to variants of SEQ ID NO: 1 and SEQ ID N0:7, as recited in claims 1 and 2, respectively. 
Applicants submit that the polypeptides and polynucleotides of the invention, including the 
recited variants, are adequately described in accordance with 35 U.S.C. § 112, first paragraph, 
and supported by relevant case law, some of which is referred to by the Examiner. 

The requirements necessary to fulfill the written description requirement of 35 U.S.C. 
1 12, first paragraph, are well established by case law. 
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. . • the appUcant must also convey with reasonable clarity to those skilled 
m the art that, as of the filing date sought, he or she was in possession of the 
invention. The invention is, for puiposes of the "written description" inquiry, 
whatever is now claimed. Vas-Cath, Inc. v. Mahurkar, 19 USPQ2d 1111 111? 
(Fed. Cir. 1991) 



Attention is also drawn to the Patent and Trademark Office' s own "Guidelines for 
Examination of Patent Applications Under the 35 U.S.C. Sec. 1 12, para. 1", published January 5, 
2001, which provide that : 

An appUcant may also show that an invention is complete by disclosure of 
sufficiently detailed, relevant identifying characteristics which provide evidence 
that appUcant was in possession of the claimed invention, i.e., complete or partial 
structure, other physical and/or chemical properties, functional characteristics 
when coupled with a known or disclosed correlation between function and 
structure, or some combination of such characteristics. What is conventional or 
weU known to one of ordinary skill in the art need not be disclosed in detail. If a 
skiUed artisan would have understood the inventor to be in possession of the 
claimed invention at the time of filing, even if every nuance of the claims is not 
expUcitly described in the specification, then the adequate description requirement 
is met. 



Thus, the written description standard is fiilfiUed by both what is specificaUy disclosed 
and what is conventional or well known to one skilled in the art. 

SEQ ID NO: 1 and SEQ ID N0:7 are specifically disclosed in the priority application 
Serial No. 09/156,513 (see, for example, page 2, lines 34-37 and page 3, lines 13-14 ). Variants 
of SEQ ID NO: 1 and SEQ ID NO:7 are described, for example, at page 2, line 38 through page 3, 
line 2. hi particular, the preferred, more preferred, and most preferred variants (80%, 90%, and 
95% amino acid sequence similarity to SEQ ID NO: 1) are described, for example, at page 12, 
lines 13-16 of priority application Serial No. 09/156,513. Incyte clones in which the nucleic 
acids encoding the human HGPRP-1 (SEQ ID NO: 1) were first identified and libraries from 
which those clones were isolated are described, for example, at page 11, lines 24-30 and Table 1 
of the priority appUcation. Chemical and structural features of SEQ ID NO: 1 are described, for 
example, on page 11, lines 31-35 and Table 2 of the priority application. Given SEQ ID NO: 1, 
one of ordinary skiU in the art would recognize naturaUy-occurring variants of SEQ ID NO: 1 
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having at least 90% sequence identity to SEQ ED NO: 1. Accordingly, the Specification provides 
an adequate written description of the recited polypeptide sequences. 

A. The Specification provides an adequate written description of the clauned "variants" of 
SEQIDNOrl. 

The Office Action has further asserted that the claims are not supported by an adequate 
written description because: 

Claims 1-6 contain "subject matter which is not described m the specification in 
such a was as to reasonably convey to one skilled in the relevant art that the 
inventor(s), at the time the application was filed, had possession of the claimed 
invention". 

(page 8 of the Final Office Action) 

Such a position is believed to present a misapplication of the law. 

1. The present claims specifically define the claimed genus through the recitation of 
chemical structure 

Court cases m which "DNA claims" have been at issue (which are hence relevant to 
claims to proteins encoded by the DNA and antibodies which specifically bind to the protems) 
commonly emphasize that the recitation of structural features or chemical or physical properties 
are important factors to consider in a written description analysis of such claims. For example, in 
Fiers v. Revel, 25 USPQ2d 1601, 1606 (Fed. Cir. 1993), the court stated that: 

If a conception of a DNA requires a precise definition, such as by structure, 
formula, chemical name or physical properties, as we have held, then a description 
also requires that degree of specificity. 

In a number of instances m which claims to DNA have been found invalid, the courts 
have noted that the claims attempted to define the claimed DNA in terms of functional 
characteristics without any reference to structural features. As set forth by the court in University 
of California v. Eli Lilly and Co. , 43 USPQ2d 1398, 1406 (Fed. Cir. 1997): 

In claims to genetic material, however, a generic statement such as "vertebrate 
insulin cDNA" or "mammalian insuHn cDNA," without more, is not an adequate 
written description of the genus because it does not distinguish the claimed genus 
from others, except by function. 
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Thus, the mere recitation of functional characteristics of a DNA, without the definition of 
structural features, has been a common basis by which courts have found invalid claims to DNA. 
For example, in Lilly, 43 USPQ2d at 1407, the court found invalid for violation of the written 
description requirement the following claim of U.S. Patent No. 4,652,525: 

1 . A recombinant plasmid repUcable in procaryotic host containing within its 
nucleotide sequence a subsequence havmg the stmcture of the reverse transcript of 
an mRNA of a vertebrate, which mRNA encodes insulin. 

In Fiers, 25 USPQ2d at 1603, the parties were in an interference involving the following 

count: 

A DNA which consists essentially of a DNA which codes for a human fibroblast 
interferon-beta polypeptide. 

Party Revel in the Fiers case argued that its foreign priority application contained an 
adequate written description of the DNA of the count because that application mentioned a 
potential method for isolating the DNA. The Revel priority application, however, did not have a 
description of any particular DNA structure corresponding to the DNA of the count. The court 
therefore found that the Revel priority application lacked an adequate written description of the 
subject matter of the count. 

Thus, in Lilly and Fiers, nucleic acids were defmed on the basis of functional 
characteristics and were found not to comply with the written description requirement of 35 
U.S.C. §112; Le,, "an mRNA of a vertebrate, which mRNA encodes insulin" in Lilly, and "DNA 
which codes for a human fibroblast interferon-beta polypeptide" in Fiers. In contrast to the 
situation in Lilly and Fiers, the claims at issue in the present application define polynucleotides 
and polypeptides in terms of chemical structure, rather than functional characteristics. For 
example, the "variant language" of independent claim 1 recites chemical structure to define the 
claimed genus: 

1. An isolated cDNA comprising a nucleic acid encoding an amino acid sequence 
selected from:...c) a variant of SEQ ID NO: 1 having at least 90% amino acid 
sequence identity to SEQ ID NO: 1 . . . 

From the above it should be apparent that the claims of the subject application are 
fundamentally different from those found invalid in Lilly and Fiers, The subject matter of the 
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present claims is defined in terms of the chemical structure of SEQ E) NO: 1 . In the present case, 
there is no reliance merely on a description of functional characteristics of the polynucleotides or 
polypeptides recited by the claims. In fact, there is no recitation of functional characteristics. 
Moreover, if such functional recitations were included, it would add to the structural 
characterization of the recited polynucleotides or polypeptides or. The polynucleotides or 
polypeptides defined in the claims of the present application recite structural features, and cases 
such as Lilly and Fiers stress that the recitation of structure is an important factor to consider in a 
written description analysis of claims of this type. By failing to base its written description 
inquiry "on whatever is now claimed," the Office Action failed to provide an appropriate analysis 
of the present claims and how they differ from those found not to satisfy the written description 
requirement in Lilly and Fiers 

2. The present claims do not define a genus which is "highly variant" 

Furthermore, the claims at issue do not describe a genus which could be characterized as 
highly variant, i.e., "encompassing a substantial variety of subgenera" (Final Office Action, page 
8), Available evidence illustrates that the claimed genus is of narrow scope. 

In support of this assertion, the Examiner's attention is directed to the reference by 
Brenner et al. ("Assessing sequence comparison methods with reliable structurally identified 
distant evolutionary relationships," Proc. Natl. Acad. Sci. USA (1998) 95:6073-6078; cited at 
page 29 of the instant application). Through exhaustive analysis of a data set of proteins with 
known structural and functional relationships and with <90% overall sequence identity, Brenner 
et al. have determined that 30% identity is a reliable threshold for estabhshing evolutionary 
homology between two sequences aligned over at least 150 residues. (Brenner et al., pages 6073 
and 6076.) Furthermore, local identity is particularly important m this case for assessing the 
significance of the alignments, as Brenner et al. further report that >40% identity over at least 70 
residues is reliable in signifying homology between proteius. (Brenner et al., page 6076.) 

The present application is directed, inter alia, to GPCR proteins, in particular, 
metabotropic glutamate GPCR proteins related to the amino acid sequence of SEQ ID NO: 1. In 
accordance with Brenner et al, naturally occurring molecules may exist which could be 
characterized as metabotropic glutamate GIPCR proteins and which have as little as 40% identity 
over at least 70 residues to SEQ ID NO: 1. The "variant language" of the present claims recites, 
for example, polynucleotides encoding "an amino acid sequence having at least 90% amino acid 
sequence identity SEQ ID NO: 1 " (note that SEQ ID NO: 1 has 441 amino acid residues). This 
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variation is far less than that of all potential metabotropic glutamate GPCR proteins related to 
SEQ ID N0:1, i.e., those metabotropic glutamate GPCR proteins having as httle as 40% identity 
over at least 70 residues to SEQ ID NO: 1. 

3. The state of the art at the time of the present invention is further advanced than at 
the time of the Lilly and Fiers applications 

In the Lilly case, claims of U.S. Patent No. 4,652,525 were found invalid for failing to 
comply with the written description requkement of 35 U.S.C. § 1 12. The '525 patent claimed the 
benefit of priority of two appUcations, Apphcation Serial No. 801,343 filed May 27, 1977, and 
Application Serial No. 805,023 filed June 9, 1977. In the Fiers case, party Revel claimed the 
benefit of priority of an Israeh apphcation filed on November 21, 1979. Thus, the written 
description inquiry in those case was based on the state of the art at essentially at the "dark ages" 
of recombinant DNA technology. 

The present apphcation has a priority date of September 17, 1998. Much has happened in 
the development of recombinant DNA technology in the 20 or more years from the time of filmg 
of the applications involved in Lilly and Fiers and the present apphcation. For example, the 
technique of polymerase chain reaction (PGR) was invented. Highly efficient cloning and DNA 
sequencing technology has been developed. Large databases of protein and nucleotide sequences 
have been compiled. Much of the raw material of the human and other genomes has been 
sequenced. With these remarkable advances one of skill in the art would recognize that, given 
the sequence information of SEQ ID NO: 1 and SEQ ID N0:7, and the additional extensive detail 
provided by the subject apphcation, the present mventors were m possession of the claimed 
polynucleotide variants at the time of filing of this application. 

4. Summary 

The Office Action failed to base its written description inquiiy "on whatever is now 
claimed." Consequently, the Action did not provide an appropriate analysis of the present claims 
and how they differ from those found not to satisfy the written description requirement in cases 
such as Lilly and Fiers. In particular, the claims of the subject apphcation are fundamentally 
different from those found invalid in Lilly and Fiers, The subject matter of the present claims is 
defined m terms of the chemical structure of SEQ ID NO: 1 or SEQ ID N0:7. The courts have 
stressed that structural features are important factors to consider in a written description analysis 
of claims to nucleic acids and proteins. In addition, the genus of polynucleotides or polypeptides 
defined by the present claims is adequately described, as evidenced by Brenner et a] and 
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consideration of the claims of the 740 patent involved in Lilly. Furthennore, there have been 
remarkable advances in the state of the art since the Lilly and Fiers cases, and these advances 
were given no consideration whatsoever in the position set forth by the Office Action. 

Claims 1 and 3-6 stand rejected under 35 U.S.C. § 102(b) as anticipated by Valenzuela et 
al. (WO 99/55271, November 4, 199) and, alternatively under 35 U.S.C. § 102(e) as anticipated 
by Moore et al. (U.S. PubUshed Application 2003005536, effective fiUng date June 17, 1999). 
The rejection alleges in particular, that: 

Valenzuela disclose a nucleic acid molecule (SEQ ID NO:43, claim 52) that encodes a 
protein (SEQ ID NO:45, claim 53) that is 100% identical to the polypeptide of SEQ ID 
NO:7 of the instant appUcation, thus anticipating the claims. Valenzuela et al. also teach 
vectors, host cells, a method of producing protein, and labeled cDNA. 

Moore et al. disclose a nucleic acid moecule (SEQ ID NO:22) that encodes a protein 
(SEQ ID NO: 146) that is 100% identical to the polypeptide of SEQ ID NO: 1 from amino 
acids 1-384 of the instant application, and therefore discloses an isolated cDNA encoding 
a fragment of SEQ ID NO: 1 from I51-V72, G88-V109, CI 16-A145, 1156-L175, M207- 
P229, or G242-T264 of SEQ ID N0:1, as recited in claim 1. Moore et al also teach 
vectors, host cells, and a method of making a protein, therefore anticipating claims 3-6 as 
well. 

Because the instant application does not meet the requirements of 35 U.S.C. § 1 12, first 
paragraph, for the reasons given above, and it is a continuation of application Serial No. 
09/516,513, the prior application does not meet these requirements and therefore is 
unavailable under 35 U.S.C. § 120. Under these circumstances, Valenzuela et al. and 
Moore et al. anticipate the claimed invention. 

The now claimed invention, at least as recited in claims 1 and 3-6, is supported by 
both a specific and substantial asserted utility and a well established utUity that is disclosed 
and enabled in priority application Serial No. 09/516,513 

Applicants submit that, for the reasons cited above in response to the rejection of claims 
under 35 U.S.C. §§ 101/1 12, the specification supports a specific and substantial asserted utility, 
as well as a well established utiUty for the claimed invention that is similarly disclosed in the 
priority application Serial No. 09/516,513 in accordance with 35 U.S.C. § 120, therefore 
providing an effective filing date for the instant application of September 17, 1998. 
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APPENDIX - CLAIMS ON APPEAL 

1. An isolated cDNA comprising a nucleic acid encoding an amino acid sequence selected 
from: 

a) an amino acid sequence of SEQ ID NO: 1 ; 

b) a fragment of SEQ ID NO: 1 from I51-V72, G88-V109, CI 16-A145, 1156-L175, 
M207-P229, or G242-T264 of SEQ ID NO: 1 ; 

c) a variant of SEQ ID NO: 1 haviag at least 90% amino acid sequence identity to SEQ 
IDNO:l;and 

d) the complement of the encoding nucleic acid sequence of a), b), or c). 

> 

2. An isolated cDNA comprising a nucleic acid sequence selected from: 

a) SEQIDN0:7;and 

b) a variant of SEQ ID N0:7 having at least 95% identity to SEQ ID N0;7. 

3. A composition comprising the cDNA of claim 1 and a labeling moiety. 

4. A vector comprising the cDNA of claim 1. 

5. A host cell comprising the vector of claim 4. 

6. A method for using a cDNA to produce a protein, the method comprising: 

a) culturing the host cell of claim 5 under conditions for protein expression; and 

b) recovering the protein from the host cell culture. 
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Whole genome analysis: Experimental access to all genome 
sequenced segments through larger-scale efficient 
oligonucleotide synthesis and PGR 
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Contributed by Ronald W. Davis, May 20, 1997 

ABSTRACT The recent ability to sequence whole genomes 
allows ready access to all genetic material. The approaches 
outlined here allow automated analysis of sequence for the 
synthesis of optimal primers in an automated multiplex 
oligonucleotide synthesizer (AMOS), The efficiency is such 
that all ORFs for an organism can be amplified by PCR. The 
resulting amplicons can be used directly in the construction of 
DNA arrays or can be cloned for a large variety of functional 
analyses. These tools allow a replacement of single-gene 
analysis with a highly efficient whole-genome analysis. 



The genome sequencing projects have generated and will 
continue to generate enormous amounts of sequence data. The 
genomes of Saccharomyces cerevisiae, Escherichia coli, Hae- 
mophilus influenzae (1), Mycoplasma genitalium (2), and Meth- 
anococcus jannaschii (3) have been completely sequenced. 
Other model organisms have had substantial portions of their 
genomes sequenced as well, including the nematode Caeno- 
rhabditis elegans (4) and the small flowering plant Arabidopsis 
thaliana (5). This massive and increasing amount of sequence 
information allows the development of novel experimental 
approaches to identify gene function. 

One standard use of genome sequence data is to attempt to 
identify the functions of predicted open reading frames 
(ORFs) within the genome by comparison to genes of known 
function. Such a comparative analysis of all ORFs to existing 
sequence data is fast, simple, and requires no experimentation 
and is therefore a reasonable first step. While finding sequence 
homologies/motifs is not a substitute for experimentation, 
noting the presence of sequence homology and/or sequence 
motifs can be a useful first step in finding interesting genes, in 
designing experiments and, in some cases, predicting function. 
However, this type of analysis is frequently uninformative. For 
example, over one-half of new ORFs in S, cerevisiae have no 
known function (6). If this is the case in a well studied organism 
such as yeast, the problem will be even worse in organisms that 
are less well studied or less manipulable. A large, experimen- 
tally determined gene function database would make homol- 
ogy/motif searches much more useful. 

Experimental analysis must be performed to thoroughly 
understand the biological function of a gene product. Scaling 
up from classical "cottage industry" one-gene-oriented ap- 
proaches to whole-genome analysis would be very expensive 
and laborious. It is clear that novel strategies are necessary to 
efficiently pursue the next phase of the genome projects— 
whole-genome experimental analysis to explore gene expres- 
sion, gene product function, and other genome functions. 
Model organisms, such as S. cerevisiae, will be extremely 

The publication costs of this article were defrayed in part by page charge 
payment. This article must therefore be hereby marked ''advertisement'' in 
accordance with 18 U.S.C. §1734 solely to indicate this fact. 

© 1997 by The Naiional Academy of Sciences 0027-8424/97/948945-3$2.00/0 
PNAS is available online at http://www.pnas.org. 



important in the development of novel whole-genome analysis 
techniques and, subsequently, in improving our understanding 
of other more complex and less manipulable organisms. 

The genome sequence can be systematically used as a tool 
to understand ORFs, gene product function, and other ge- 
nome regions. Toward this end, a directed strategy has been 
developed for exploiting sequence information as a means of 
providing information about biological function (Fig. 1). Ef- 
forts have been directed toward the amplification of each 
predicted ORF or any other region of the genome ranging 
from a few base pairs to several kilobase pairs. There are many 
uses for these amplicons— they can be cloned into standard 
vectors or specialized expression vectors, or can be cloned into 
other specialized vectors such as those used for two-hybrid 
analysis. The amplicons can also be used directly by, for 
example, arraying onto glass for expression analysis, for DNA 
binding assays, or for any direct DNA assay (7). As a pilot 
study, synthetic primers were made on the 96-well automated 
multiplex oligonucleotide synthesizer (AMOS) instrument (8) 
(Fig. 2). These oligonucleotides were used to amplify each 
ORF on yeast chromosome V. The current version of this 
instrument can synthesize three plates of 96 oligonucleotides 
each (25 bases) in an 8-hr day. The amplification of the entire 
set of PCR products was then analyzed by gel electrophoresis 
(Fig. 3). Successful amplification of the proper length product 
on the first attempt was 95%. This project demonstrates that 
one can go directly from sequence information to biological 
analysis in a truly automated, totally directed manner. 

These amplicons can be incorporated directly in arrays or 
the amplicons can be cloned. If the amplicons are to be cloned, 
novel sequences can be incorporated at the 5' end of the 
oligonucleotide to facilitate cloning. One potential problem 
with cloning PCR products is that the cloned amplicons may 
contain sequence alterations that diminish their utility. One 
option would be to resequence each individual amplicon. 
However, this is expensive, inefficient, and time consuming. A 
faster, more cost-effective, and more accurate approach is to 
apply comparative sequencing by denaturing HPLC (9). This 
method is capable of detecting a single base change in a 2-kb 
heteroduplex. Longer amplicons can be analyzed by use of 
appropriate restriction fragments. If any change is detected in 
a clone, an alternate clone of the same region can be analyzed. 
Modifying the system to allow high throughput analysis by 
denaturing HPLC is also relatively simple and straightforward. 

If amplicons are used directly on arrays without cloning, it 
is important to note that, even if single PCR product bands are 
observed on gels, the PCR products will be contaminated with 
various amounts of other sequences. This contamination has 
the potential to affect the results in, for example, expression 

^Present address: Synleni, Inc., 6519 Dumbarton Circle, Fremont. CA 
94555. 

§To whom reprint requests should be addressed at: Department of 
Biochemistry. Beckman Center, 8400, Stanford University. Stanford, 
CA 94305-5307. e-mail: gilbert(a)cmgm.stanford.edu. 
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Fr(j. 1. Overview of systematic method for isolating individual 
genes. Sequence mformation is obtained automatically from sequence 
databases. The data are input into primer selection software specifi- 
cally designed to target ORFs as designated by database annotations. 
Ihe output file containing the primer information is directly read by 
a high-throughput oligonucleotide synthesizer, which makes the oli- 
gonucleotides in 96-wel! plates (AMOS, automated multiplex olieo- 
nucleotide synthesizer). The forward and reverse primers are synthe- 
sized in the same location on separate plates to facilitate the down- 
stream handling of primers. The amplicons are generated by PGR in 
%-weIl plates as well. 

analysis. On the other hand, direct use of the amplicons is 
much less labor intensive and greatly decreases the occurrence 
of mistakes in clone identification, a ubiquitous problem 
associated with large clone set archiving and retrieving. 

Any large-scale effort to capture each ORF within a genome 
must rely on automation if cost is to be minimized while 
efficiency is maximized. Toward that end, primers targeting 
ORFs were designed automatically using simple new scripts 
and existing primer selection software. These script-selected 
primer sequences were directly read by the high-throughput 
synthesizer and the forward and reverse primers were synthe- 
sized in separate plates in corresponding wells to facilitate 
automated pipetting and PGR amplifications. Each of the 
resulting PGR products, generated with minimum labor, con- 
tains a known, unique ORF. 

Large-scale genome analysis projects are dependent on 
newly emerging technologies to make the studies practical and 
economically feasible. For example, the cost of the primers, a 
significant issue in the past, has been reduced dramatically to 
make feasible this and other projects that require tens of 
thousands of oligonucleotides. Other methods of high- 
throughput analysis are also vital to the success of functional 
analysis projects, such as microarraying and oligonucleotide 
chip methods (10-14). 

Ghanges in attitude are also required. One of the major costs 
of commercial oligonucleotides is extensive quality control 
such that virtually 100% of the supplied oligonucleotides are 
successfully synthesized and work for their intended purpose 
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FrtJ. 2. Overall approach for using database of a genome to direct 
biological analysis. The synthesis of the 6,000 ORFs (orfs) for each 
gene of 5. cerevisiae can be used in many applications utilizing both 
cloning and microarraying technology. 

Considerable cost reduction can be obtained by simply de- 
creasing the expected successful synthesis rate to 95-97%. One 
can then achieve faster and cheaper whole genome coverage by 
simply adding a single quality control at the end of the 
experiment and batching the failures for resynthesis. 

The directed nature of the amplicon approach is of clear 
advantage. The sequence of each ORF is analyzed automati- 
cally, and unique specific primers are made to target each 
ORF. Thus, there is relatively little time or labor involved— for 
example, no random cloning and subsequent screening is 
required because each product js known. In the test system, 
primers for 240 ORFs from chromosome V were systematically 
synthesized, beginning from the left arm and continuing 
through to the right arm. At no point was there any manual 
analysis of sequence information to generate the collection. In 
many ways, now that the sequence is known, there is no need 
for the researcher to examine it. 

These amplicons can be arrayed and expression analysis can 
be done on all arrayed ORFs with a single hybridization (10). 
Those ORFs that display significant differential expression 
patterns under a given selection are easily identified without 
the laborious task of searching for and then sequencing a clone. 
Once scaled up, the procedure provides even greater returns 
on effort, because a single hybridization will ultimately provide 
a "snapshot" of the expression of all genes in the yeast genome. 
Thus, the limiting factor in whole genome analysis will not be 
the analysis process itself, but will instead be the ability of 
researchers to design and carry out experimental selections. 

Gurreni expression and genetic analysis technologies are 
geared toward the analysis of single genes and are ill suited to 
analyze numerous genes under many conditions. Additional 
difficulties with current technologies include: the effort and 
expense required to analyze expression and make mutants, the 
potential duplication of effort if done by different laboratories 
and the possibility of conflicting results obtained from differ- 
ent laboratories. In contrast, whole genome analysis not only 
IS more efficient, it also provides data of much higher quality; 
all genes are assayed and compared in parallel under exactly 
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Fr(;. 3. Gel image of amplifications. Using the method described in 
V. One plate of 96 amplification reactions is shown. 

the same conditions. In addition, amplicons have many appli- 
cations beyond gene expression. For example, one recent 
approach is to incorporate a unique DNA sequence tag, 
synthesized as part of each gene specific primer, during 
amplification. The tags or molecular bar codes, when reintro- 
duced into the organism as a gene deletion or as a gene clone, 
can be used much more efficiently than individual mutations 
or clones because pools of tagged mutants or transformants 
can be analyzed in parallel. This parallel analysis is possible 
because the tags are readily and quantitatively amplified even 
in complex mixtures of tags (13). 

These ORF genome arrays and oligonucleotide tagged 
libraries can be used for many applications. Any conventional 
selection applied to a library that gives discrete or multiple 
products can use these technologies for a simple direct read- 
out. These include screens and selections for mutant comple- 
mentation, overexpression suppression (15, 16), second-sile 
suppressors, synthetic lethality, drug target overexpression 
(17), two-hybrid screens ( 18), genome mismatch scann ing ( 19), 
or recombination mapping. 

The genome projects have provided researchers with a vast 
amount of information. These data must be used efficiently 
and systematically to gain a truly comprehensive understand- 
mg of gene function and, more broadly, of the entire genome 
which can then be applied to other organisms. Such global 
approaches are essential if we are to gain an understanding of 
the living cell. This understanding should come from the 
viewpoint of the integration of complex regulatory networks, 
the individual roles and interactions of thousands of functional 
gene products, and the effect of environmental changes on 
both gene regulatory networks and the roles of all gene 
products. The time has come to switch from the analysis of a 
single gene to the analysis of the whole genome. 

Support was provided by National Institutes of Health Grants 
R37H60198 and P01H600205. 



Fig. 1, amplicons were generated for ORFs of 5. cercvisiae chromosome 
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The availability of genome-scale DNA sequence information and reagents has radically altered life-science 
research. This revolution has led to the development of a new scientific subdiscipline derived from a combina- 
tion of the fields of toxicology and genomics. This subdiscipline, termed toxicogenomics, is concerned with the 
identification of potential human and environmental toxicants, and their putative mechanisms of action, through 
the use of genomics resources. One such resource is DNA microarrays or "chips," which allow the monitoring of 
the expression levels of thousands of genes simultaneously. Here we propose a general method by which gene 
expression, as measured by cDNA microarrays, can be used as a highly sensitive and informative marker for 
toxicity. Our purpose is to acquaint the reader with the development and current state of microarray technol- 
ogy and to present our view of the usefulness of microarrays to the field of toxicology. Mol. Carcinog. 24:153- 

159, 1999. © 1999 Wiley-Liss, Inc. 
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INTRODUCTION 

Technological advancements combined with in- 
tensive DNA sequencing efforts have generated an 
enormous database of sequence information over the 
past decade. To date, more than 3 million sequences, 
totaling over 2.2 bilUon bases [1], are contained 
within the GenBank database, which includes the 
complete sequences of 19 different organisms [2]. The 
first complete sequence of a free-living organism, 
Haemophilus influenzae, ^wdis reported in 1995 [3] and 
was followed shortly thereafter by the first complete 
sequence of a eukaryote, Saccharomyces cervisiae [4]. 
The development of dramatically improved sequenc- 
ing methodologies promises that complete elucida- 
tion of the Homo sapiens DNA sequence is not far 
behind [5]. , 

To exploit more fully the wealth of new sequence 
information; it was necessary to develop novel meth- 
ods for the high-throughput or parallel monitoring 
of gene expression. Established methods such as 
northern blotting, RNAse protection assays, SI nu- 
clease analysis, plaque hybridization, and slot blots 
do not provide sufficient throughput to effectively 
utilize the new genomics resources. Newer methods 
such as differential display [6], high-density filter 
hybridization [7,8], serial analysis of gene expression 
[9], and cDNA- and oligonucleotide-based miaoarray 
"chip" hybridization [10-12] are possible solutions 
to this bottleneck. It is our belief that the microarray 
approach, which allows the monitoring of expres- 
sion levels of thousands of genes simultaneously, is 
a tool of unprecedented power for use in toxicology 
studies. 



Almost without exception, gene expression is al- 
tered during toxicity, as either a direct or indirect 
result of toxicant exposure. The challenge facing 
toxicologists is to define, under a given set of ex- 
perimental conditions, the characteristic and spe- 
cific pattern of gene expression elicited by a given 
toxicant. Microarray technology offers an ideal plat- 
form for this type of analysis and could be the foun- 
dation for a fundamentally new approach to 
toxicology testing. 

MICROARRAY DEVELOPMENT AND APPLICATIONS 

cDNA Microarrays 

In the past several years, numerous systems were 
developed for the construction of large-scale DNA 
arrays. All of these platforms are based on cDNAs 
or oligonucleotides immobilized to a solid sup- 
port. In the cDNA approach, cDNA (or genomic) 
clones of interest are arrayed in a multi-well for- 
mat and amplified by polymerase chain reaction. 
The products of this amplification, which are usu- 
ally 500- to 2000-bp clones from the 3' regions of 
the genes of interest, are then spotted onto solid 
support by using high-speed robotics. By using 
this method, microarrays of up to 10 000 clones 
can be generated by spotting onto a glass substrate 
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[13,14]. Sample detection for microarrays on glass 
involves the use of probes labeled with fluores- 
cent or radioactive nucleotides. 

Fluorescent cDNA probes are generated from con- 
trol and test RNA samples in single-round reverse-tran- 
scription reactions in the presence of fluorescently 
tagged dUTP (e.g., Cy3-dUTP and Cy5-dUTP), which 
produces control and test products labeled with dif- 
ferent fluors. The cDNAs generated from these two 
populations, collectively termed the "probe," are then 
mixed and hybridized to the array under a glass cov- 
erslip [10,11,15]. The fluorescent signal is detected 
by using a custom-designed scanning confocal mi- 
croscope equipped with a motorized stage and lasers 
for fluor excitation [10,11,15]. The data are analyzed 
with custom digital image analysis software that de- 
termines for each DNA feature the ratio of fluor 1 to 
fluor 2, corrected for local background [16,17]. The 
strength of this approach lies in the ability to label 
RNAs from control and treated samples with differ- 
ent fluorescent nucleotides, allowing for the simul- 
taneous hybridization and detection of both 
populations on one microarray. This method elimi- 
nates the need to control for hybridization between 
arrays. The research groups of Drs. Patrick Brown and 
Ron Davis at Stanford University spearheaded the 
effort to develop this approach, which has been suc- 
cessfully applied to studies of Arabidopsis thaliana 
RNA [10], yeast genomic DNA [15], tumorigenic ver- 
sus non-tumorigenic human tumor cell lines [11], 
human T-cells [18], yeast RNA [19], and human in- 
flammatory disease-related genes [20]. The most dra- 
matic result of this effort was the first published 
account of gene expression of an entire genome, that 
of the yeast Saccharomyces cervisiae [21]. 

In an alternative approach, large numbers of cDNA 
clones can be spotted onto a membrane support, al- 
beit at a lower density [7,22], This method is useful 
for expression profiling and large-scale screening and 
mapping of genomic or cDNA clones [7,22-24]. In 
expression profiling on filter membranes, two dif- 
ferent membranes are used simultaneously for con- 
trol and test RNA hybridizations, or a single 
membrane is stripped and reprobed. The signal is 
detected by using radioactive nucleotides and visu- 
alized by phosphorimager analysis or autoradiogra- 
phy. Numerous companies now sell such cDNA 
membranes and software to analyze the image data 
[25-27]. 

Oligonucleotide Microarrays 

Oligonucleotide microarrays are constructed either 
by spotting prefabricated oligos on a glass support 
[13] or by the more elegant method of direct in situ 
oligo synthesis on the glass surface by photolithog- 
raphy [28-30]. The strength of this approach lies in 
its ability to discriminate DNA molecules based on 
single base-pair difference. This allows the applica- 
tion of this method to the fields of medical diagnos- 



tics, pharmacogenetics, and sequencing by hybrid- 
ization as well as gene-expression analysis. 

Fabrication of oligonucleotide chips by photoli- 
thography is theoretically simple but technically 
complex [29,30]. The light from a high-intensity 
mercury lamp is directed through a photolitho- 
graphic mask onto the silica surface, resulting in 
deprotection of the terminal nucleotides in the illu- 
minated regions. The entire chip is then reacted with 
the desired free nucleotide, resulting in selected chain 
elongation. This process requires only 4n cycles 
(where n = oligonucleotide length in bases) to syn- 
thesize a vast number of unique oligos, the total num- 
ber of which is limited only by the complexity of the 
photolithographic mask and the chip size [29,31,32]. 

Sample preparation involves the generation of 
double-stranded cDNA from cellular poly(A)+ RNA 
followed by antisense RNA synthesis in an in vitro 
transcription reaction with biotinylated or fluor- 
tagged nucleotides. The RNA probe is then frag- 
mented to facilitate hybridization. If the indirect 
visualization method is used, the chips are incubated 
with fluor-linked streptavidin (e.g., phycoerythrin) 
after hybridization [12,33]. The signal is detected with 
a custom confocal scanner [34], This method has 
been applied successfully to the mapping of genomic 
library clones [35], to de novo sequencing by hybrid- 
ization [28,36], arid to evolutionary sequence com- 
parison of the BRCAl gene [37], In addition, 
mutations in the cystic fibrosis [38] and BRCAl [39] 
gene products and polymorphisrns in the human im- 
munodeficiency virus-1 clade B protease gene [40] 
have been detected by this method. Oligonucleotide 
chips are also useful for expression monitoring [33] 
as has been demonstrated by the simultaneous evalu- 
ation of gene-expression patterns in nearly all open 
reading frames of the yeast strain S. cerevisiae [12], 
More recently, oligonucleotide chips have been used 
to help identify single nucleotide polymorphisms in 
the human [41] and yeast [42] genomes. 

THE USE OF MICROARRAYS IN TOXICOLOGY 

Screening for Mechanism of Action 

The field of toxicology uses numerous in vivo 
model systems, including the rat, mouse, and rab- 
bit, to assess potential toxicity and these bioassays 
are the mainstay of toxicology testing. However, in 
the past several decades, a plethora of in vitro tech- 
niques have been developed to measure toxicity, 
many of which measure toxicant-induced DNA dam- 
age. Examples of these assays include the Ames test, 
the Syrian hamster embryo cell transformation as- 
say, micronucleus assays, measurements of sister 
chromatid exchange and unscheduled DNA synthe- 
sis, and many others. Fundamental to all of these 
methods is the fact that toxicity is often preceded 
by, and results in, alterations in gene expression. In 
many cases, these changes in gene expression are a 
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far more sensitive, characteristic; and measurable 
endpoint than the toxicity itself. We therefore pro- 
pose that a method based on measurements of the 
genome-wide gene expression pattern of an organ- 
ism after toxicant exposure is fundamentally infor- 
mative and complements the established methods 
described above. 

We are developing a method by which toxicants 
can be identified and their putative mechanisms of 
action determined by using toxicant-induced gene ex- 
pression profiles. In this method, in one or more de- 
fined model systems, dose and time-course parameters 
are established for a series of toxicants within a given 
prototypic class (e.g., polycyclic aromatic hydrocar- 
bons (PAHs)). Cells are then treated with these agents 
at a fixed toxicity level (as measured by cell survival), 
RNA is harvested, and toxicant-induced gene expres- 
sion changes are assessed by hybridization to a cDNA 
microarray chip (Figure 1). We have developed a cus- 
tom DNA chip, called ToxChip vl.O, specifically for 
this purpose and will discuss it in more detail below. 
The changes in gene expression induced by the test 
agents in the model systems are analyzed, and the 
common set of changes unique to that class of toxi- 
cants, termed a toxicant signature, is determined. 

This signature is derived by ranking across all ex- 
periments the gene-expression data based on rela- 

Control 
Population 



tive fold induction or suppression of genes in treated 
samples versus untreated controls and selecting the 
most consistently different signals across the sample 
set. A different signature may be established for each 
prototypic toxicant class. Once the signatures are de- 
termined, gene-expression profiles induced by un- 
known agents in these same model systems can then 
be compared with the established signatures. A match 
assigns a putative mechanism of action to the test 
compound. Figure 2 illustrates this signature method 
for different types of oxidant stressors, PAHs, and 
peroxisome proliferators. In this example, the un- 
known compound in question had a gene-expres- 
sion profile similar to that of the oxidant stressors in 
the database. We anticipate that this general method 
will also reveal cross talk between different pathways 
induced by a single agent (e.g., reveal that a com- 
pound has both PAH-like and oxidant-like proper- 
ties). In the future, it maybe necessary to distinguish 
very subtle differences between compounds within 
a very large sample set (e.g., thousands of highly simi- 
lar structural isomers in a combinatorial chemistry 
library or peptide library). To generate these highly 
refined signatures, standard statistical clustering tech- 
niques or principal-component analysis can be used. 

For the studies outlined in Figure 2, we developed 
the custom cDNA microarray chip ToxChip vl.O. 

Treated 
Population 
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Figure 1. Simplified overview of the method for sample 
preparation and hybridization to cDNA microarrays. For illus- 



trative purposes, samples derived from cell culture are depicted, 
although other sample types are amenable to this analysis. 
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Figure 2. Schematic representation of the method for iden- 
tification of a toxicant's mechanism of action. In th|s method, 
gene-expression data derived from exposure of model sys- 
tems to known toxicants are analyzed, and a set of changes 
characteristic to that type of toxicant (termed the toxicant 
signature) is identified. As depicted, oxidant stressors produce 



consistent changes in group A genes (indicated by red and 
green circles), but not group B or C genes (indicated by gray 
circles). The set of gene-expression changes elicited by the 
suspected toxicant is then compared with these characteristic 
patterns, and a putative mechanism of action is assigned to 
the unknown agent. 



The 2090 human genes that comprise this subarray 
were selected for their well-documented involve- 
ment in basic cellular processes as well as their re- 
sponses to different types of toxic insult. Included 
on this list are DNA replication and repair genes, 
apoptosis genes, and genes responsive to PAHs and 
dioxin-like compounds, peroxisome proliferators, 
estrogenic compounds, and oxidant stress. Some of 
the other categories of genes include transcription 
factors, oncogenes, tumor suppressor genes, cyclins, 
kinases, phosphatases, cell adhesion and motility 
genes, and homeobox genes. Also included in this 
group are 84 housekeeping genes, whose hybridiza- 
tion intensity is averaged and used for signal nor- 
malization of the other genes on the chip. To date, 
very few toxicants have been shown to have appre- 
ciable effects on the expression of these housekeep- 
ing genes. However, this housekeeping list will be 
revised if new data warrant the addition or deletion 
of a particular gene. Table 1 contains a general de- 
scription of some of the different classes of genes 
that comprise ToxChip vl.O. 

When a toxicant signature is determined, the 
genes within this signature are flagged within the 
database. When uncharacterized toxicants are then 
screened, the data can be quickly reformatted so that 
blocks of genes representing the different signatures 



are displayed [11]. This facilitates rapid, visual in- 
terpretation of data. We are also developing Tox- 
Chip v2.0 and chips for other model systems, 
including rat, mouse, Xenopus, and yeast, for use in 
toxicology studies. 

Animal Models in Toxicology Testing 

The toxicology community relies heavily on the 
use of animals as model systems for toxicology test- 
ing. Unfortunately, these assays are inherently ex- 
pensive, require large numbers of animals and take a 
long time to complete and analyze. Therefore, the 
National Institute of Environmental Health Sciences 
(NIEHS), the National Toxicology Program, and the 
toxicology community at large are committed to re- 
ducing the number of animals used, by developing 
more efficient and alternative testing methodologies. 
Although substantial progress has been made in the 
development of alternative methods, bioassays are 
still used for testing endpoints such as neurotoxic- 
ity, immunotoxicity, reproductive and developmen- 
tal toxicology, and genetic toxicology. The rodent 
cancer bioassay is a particularly expensive and time- 
consuming assay, as it requires almost 4 yr, 1200 
animals, and millions of dollars to execute and ana- 
lyze [43]. In vitro experiments of the type outlined 
in Figure 2 might provide evidence that an unknown 
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Table 1. ToxChip v1.0: A Human cDNA Microarray 
Chip Designed to Detect Responses to Toxic Insult 

No. of genes 



Gene category on chip 



Apoptosis 72 

DMA replication and repair 99 

Oxidative stress/redox homeostasis 90 

Peroxisome proliferator responsive 22 

Dioxin/PAH responsive 12 

Estrogen responsive 63 

Housekeeping 84 

Oncogenes and tumor suppressor genes 76 

Cell-cycle control 51 

Transcription factors 1 3 1 

Kinases 276 

Phosphatases 88 

Heat-shock proteins 23 

Receptors 349 

Cytochrome P4S0s 30 



*This list is intended as a general guide. The gene categories are not 
unique, and some genes are listed in multiple categories. 

agent is (or is not) responsible for eliciting a given 
biological response. This information would help to 
select a bioassay more specifically suited to the agent 
in question or perhaps suggest that a bioassay is not 
necessary, which would dramatically reduce cost, 
animal use, and time. 

The addition of microarray techniques to stan- 
dard bioassays may dramatically enhance the sen- 
sitivity and interpretability of the bioassay and 
possibly reduce its cost. Gene-expression signatures 
could be determined for various types of tissue-spe- 
cific toxicants, and new compounds could be 
screened for these characteristic signatures, provid- 
ing a rapid and sensitive in vivo test. Also, because 
gene expression is often exquisitely sensitive to low 
doses of a toxicant, the combination of gene-expres- 
sion screening and the bioassay might allow the use 
of lower toxicant doses, which are more relevant to 
human exposure levels, and the use of fewer ani- 
mals. In addition, gene-expression changes are nor- 
mally measured in hours or days, not in the months 
to years required for tumor development. Further- 
more, microarrays might be particularly useful for 
investigating the relationship between acute and 
chronic toxicity and identifying secondary effects 
of a given toxicant by studying the relationship 
between the duration of exposure to a toxicant and 
the gene-expression profile produced. Thus, a bio- 
assay that incorporates gene-expression signatures 
with traditional endpoints might be substantially 
shorter, use more realistic dose regimens, and cost 
substantially less than the current assays do. 

These considerations are also relevant for branches 
of toxicology not related to human health and not 
using rodents as model systems, such as aquatic toxi- 
cology and plant pathology. Bioassays based on the 
flathead minnow, Daphnia, and Arabadopsis could 



also be improved by the addition of microarray analy- 
sis. The combination of microarrays with traditional 
bioassays might also be useful for investigating some 
of the more intractable problems in toxicology re- 
search, such as the effects of complex mixtures and 
the difficulties in cross-species extrapolation. 

Exposure Assessment, Environmental Monitoring, 
and Drug Safety 

The currently used methods for assessment of ex- 
posure to chemical toxicants are based on measure- 
ment of tissue toxin levels or on surrogate markers 
of toxicity, termed biomarkers (e.g., peripheral blood 
levels of hepatic enzymes or DNA adducts). Because 
gene expression is a sensitive endpoint, gene expres- 
sion as measured with microarray technology may 
be useful as a new biomarker to more precisely iden- 
tify hazards and to assess exposure. Similarly, 
microarrays could be used in an environmental- 
monitoring capacity to measure the effect of poten- 
tial contaminants on the gene-expression profiles 
of resident organisms. In an analogous fashion, 
microarrays could be used to measure gene-expres- 
sion endpoints in subjects in clinical trials. The com- 
bination of these gene-expression data and more 
established toxic endpoints in these trials could be 
used to define highly precise surrogates of safety. 

Gene-expression profiles in samples from exposed, 
individuals could be compared to the profiles of the 
same individuals before exposure. From this infor- 
mation, the nature of the toxic exposure can be de- 
termined or a relative clinical safety factor estimated. 
In the future it may also be possible to estimate not 
only the nature but the dose of the toxicant for a 
given exposure, based on relative gene-expression 
levels. This general approach may be particularly 
appropriate for occupational-health applications, in 
which unexposed and exposed samples from the 
same individuals may be obtainable. For example, 
a pilot study of gene expression in peripheral-blood 
lymphocytes of Polish coke-oven workers exposed 
to PAHs (and many other compounds) is under con- 
sideration at the NIEHS. An important consideration 
for these types of studies is that gene expression can 
be affected by numerous factors, including diet, 
health, and personal habits. To reduce the effects 
of these confounding factors, it may be necessary 
to compare pools of control samples with pools of 
treated samples. In the future it may be possible to 
compare exposed sample sets to a national database 
of human-expression data, thus eliminating the 
need to provide an unexposed sample from the same 
individual. Efforts to develop such a national gene- 
expression database are currently under way [44,45]. 
However, this national database approach will re- 
quire a better understanding of genome-wide gene 
expression across the highly diverse human popu- 
lation and of the effects of environmental factors 
on this expression. 
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Alleles, Ollgo Arrays, and Toxicogenetics 

Gene sequences vary between individuals, and 
this variability can be a causative factor in human 
diseases of environmental origin [46,47]. A new area 
of toxicology, termed toxicogenetics, was recently 
developed to study the relationship between genetic 
variability and toxicant susceptibility. This field is 
not the subject of this discussion, but it is worth- 
while to note that the ability of oligonucleotide ar- 
rays to discriminate DNA molecules based on single 
base-pair differences makes these arrays uniquely 
useful for this type of analysis. Recent reports dem- 
onstrated the feasibility of this approach [41,42]. 
The NIEHS has initiated the Environmental Genome 
Project to identify common sequence polymor- 
phisms in 200 genes thought to be involved in en- 
vironmental diseases [48]. In a pilot study on the 
feasibility of this application to the Environmental 
Genome Project, oligonucleotide arrays will be used 
to resequence 20 candidate genes. This toxicogenetic 
approach promises to dramatically improve our un- 
derstanding of interindividual variability in disease 
susceptibility. 

FUTURE PRIORITIES 

There are many issues that must be addressed be- 
fore the full potential of microarrays in toxicology 
research can be realized. Among these are model sys- 
tem selection, dose selection, and the temporal na- 
ture of gene expression. In other words, in which 
species, at what dose, and at what time do we look 
for toxicant-induced gene expression? If human 
samples are analyzed, how variable is global gene 
expression between individuals, before and after toxi- 
cant exposure? What are the effects of age, diet, and 
other factors on this expression? Experience, in the 
form of large data sets of toxicant exposures, will 
answer these questions. 

One of the most pressing issues for array scientists 
is the construction of a national public database 
(linked to the existing public databases) to serve as a 
repository for gene-expression data. This relational 
database must be made available for public use, and 
researchers must be encouraged to submit their ex- 
pression data so that others may view and query the 
information. Researchers at the National Institutes 
of Health have made laudable progress in develop- 
ing the first generation of such a database [44,45]. In 
addition, improved statistical methods for gene clus- 
tering and pattern recognition are needed to ana- 
lyze the data in such a public database. 

The proliferation of different platforms and meth- 
ods for microarray hybridizations will improve 
sample handling and data collection and analysis and 
reduce costs. However, the variety of microarray 
methods available will create problems of data com- 
patibility between platforms. In addition, the near- 
infinite variety of experimental conditions under 



which data will be collected by different laborato- 
ries will make large-scale data analysis extremely dif- 
ficult. To help circumvent these future problems, a 
set of standards to be included on all platforms 
should be established. These standards would facili- 
tate data entry into the national database and serve 
as reference points for cross-platform and inter-labo- 
ratory data analysis. 

Many issues remain to be resolved, but it is clear 
that new molecular techniques such as microarray 
hybridization will have a dramatic impact on toxicol- 
ogy research. In the future, the information gathered 
from microarray-based hybridization experiments will 
form the basis for an improved method to assess the 
impact of chemicals on human and environmental 
health. 
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1. Introduction 

The majority of drugs act by binding to protein 
targets, most to known proteins representing en- 
zymes, receptors and channels, resulting in effects 
such as enzyme inhibition and impairment of 
signal transduction. The treatment-induced per- 
turbations provoke feedback reactions aiming to 
compensate for the stimulus, which almost always 
are associated with signals to the nucleus, result- 
ing in altered gene expression. Such gene expres- 
sion regulations account for both the 
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pharmacological action and the toxicity of a dru? 
and can be visualized by either global mRNA or 
global protein expression profiling. Hence, for 
each individual drug, a characteristic gene regula- 
tion pattern, its molecular fingerprint, exists 
which bears valuable information on its mode of 
action and its mechanism of toxicity. 

Gene expression is a multistep process that 
results in an active protein (Fig, 1). There exist 
numerous regulation systems that exert control at 
and after the transcription and the translation 
step. Genomics, by definition, encompasses the 
quantitative analysis of transcripts at the mRNA 
level, while the aim of proteomics is to quantify 
gene expression further down-stream, creating a 
snapshot of gene regulation closer to ultimate cell 
function control. 
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2. Global mRNA profiling 

Expression data at the mRNA level can be 
produced using a set of different technologies 
such as DNA microarrays, reverse transcript 
imaging, amplified fragment length polymorphism 
(AFLP), serial analysis of gene expression 
(SAGE) and others. Currently, DNA microarrays 
are very popular and promise a great potential. 
. On a typical array, each gene of interest is repre- 
sented either by a long DNA fragment (200-2400 
bp) typically generated by polymerase chain reac- 
tion (PGR) and spotted on a suitable substrate 
using robotics (Schena et al., 1995; Shalon et al., 
1996) or by several short oligonucleotides (20-30 
bp) synthesized directly onto a solid support using 
photolabile nucleotide chemistry (Fodor et al.^ 
1991; Chee et al., 1996). From control and treated 
tissues, total RNA or mRNA is isolated and 
reverse transcribed in the presence of radioactive 
or fluorescent labeled nucleotides, and the labeled 
probes are then hybridized to the arrays. The 
intensity of the array signal is measured for each 
gene transcript by either autoradiography or laser 
scanning confocal microscopy. The ratio between 
the signals of control and treated samples reflect 
the relative drug-induced change in transcript 
abundance. 



3. Global protein profiling 

Global quantitative expression analysis at the 
protem level is currently restricted to the use of 
two-dimensional gel electrophoresis. This tech- 
nique combines separation of tissue proteins by 
isoelectric focusing in the first dimension and by 
sodium dodecyl sulfate slab gel electrophoresis- 
based molecular weight separation on the second, 
orthogonal dimension (Anderson et al., 1991). 
The product is a rectangular pattern of protein 
spots that are typically revealed by Coomassie 
Blue, Sliver or fluorescent staining (Fis. 2) 
Protein spots are identified by mass spectrometry 
following generation of peptide mass fingerprints 
(Mann et al., 1993) and sequence tags (Wilkins et 
al., 1996). Similar to the mRNA approach, the 
ratio between the optical density of spots from 
control and treated samples are compared to 
search for treatment-related chanses. 



4. Expression data analysis 

Bioinformatics forms a key element required to 
organize, analyze and store expression data from 
either source, the mRNA or the protein level. The 
overall objective, once a mass of high-quality 
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quantitative expression data has been collected, is 
to visualize complex patterns of gene expression 
changes, to detect pathways and sets of genes 
tightly correlated with treatment efficacy and toxi- 
city, and to compare the effects of different sets of 
treatment (Anderson et al., 1996). As the drug 
effect database is growing, one may detect similar- 
ities and differences between the molecular finger- 
prints produced by various drugs, information 
that may be crucial to make a decision whether to 
refocus or extend the therapeutic spectrum of a 
drug candidate. 



5. Comparison of global mRNA and protein 
expression profiling 

There are several synergies and overlaps of data 
obtained by mRNA and protein expression analy- 
sis. Low abundant transcripts may not be easily 
quantified at the protein level using standard two- 
dimensional gel electrophoresis analysis and their 
detection may require prefractionation of sam- 
ples. The expression of such genes may be prefer- 
ably quantified at the mRNA level using 
techniques allowing PCR-mediated target amplifi- 
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cation. Tissue biopsy samples typically yield good 
quality of both mRNA and proteins; however, the 
quality of mRNA isolated from body fluids is 
often poor due to the faster degradation of 
mRNA when compared with proteins. RNA sam- 
ples from body fluids such as serum or urine are 
often not ver\' meaningfur, and secreted proteins 
are likely more reliable surrogate markers for 
treatment efficacy and safety. Detection of post- 
translational modifications, events often related to 
function or nonfunction of a protein, is restricted 
to protein expression analysis and rarely can be 
predicted by mRNA profiling. Information on 
subcellular localization and translocation of 
proteins has to be acquired at the level of the 
protein in combination with sample prefractiona- 
tion procedures. The growing evidence of a poor 
correlation between mRNA and protein abun- 
dance (Anderson and Seilhamer, 1997) further 
suggests that the two approaches, mRNA and 
protein profiling, are complementary and should 
be applied in parallel. 



6. Expression profiling and drug development 

Understanding the mechanisms of action and 
toxicity, and being able to monitor treatment 
efficacy and safety during trials is crucial for the 
successful development of a drug. Mechanistic 
insights are essential for the interpretation of drug 
effects and enhance the chances of recognizing 
potential species specificities contributing to an 
improved risk profile in humans (Richardson et 
al., 1993; Steiner et al.. 1996b; Aicher et al., 1998). 
The value of expression profiling further increases 
when links between treatment-induced expression 
profiles and specific pharmacological and toxic 
endpoints are established (Anderson et al., 1991. 
1995, 1996; Steiner et al. 1996a), Changes in gene 
expression are known to precede the manifesta- 
tion of morphological alterations, giving expres- 
sion profiling a great potential for early 
compound screening, enabling one to select drug 
candidates with wide therapeutic windows 
reflected by molecular fingerprints indicative of 
high pharmacological potency and low toxicity 
(Arce et al., 1998). In later phases of drug devel- 



opment, surrogate markers of treatment efficacy 
and toxicity can be applied to optimize the moni- 
tonng of pre-clinical and clinical studies (Dohertv 
et al., 1998). ^ 



7. Perspectives 

The basic methodology of safety evaluation has 
changed little during the past decades. Toxicity in 
laboratory animals has been evaluated primarily 
by using hematological, clinical chemistry and 
histological parameters as indicators of organ 
damage. The rapid progress in genomics and pro- 
teomics technologies creates a unique opportunity 
to dramatically improve the predictive power of 
safety assessment and to accelerate the drug devel- 
opment process. Application of gene and protein 
expression profiling promises to improve lead se- 
lection, resulting in the development of drug can- 
didates with higher efficacy and lower toxicity. 
The identification of biologically relevant surro- 
gate markers correlated with treatment efficacy 
and safety bears a great potential to optimize the 
monitoring of pre-cHnical and clinical trails. 



References 

Aicher, L, Wahl. D.. Arce, A., Grenet. O.. Sterner. S., 1998. 
New insights into cyclosporine A nephrotoxicity by pro^ 
teome analysis. Electrophoresis 19, 1998-2003, 
Anderson. N.L., Seilhamer. J.. 1997. A comparison of selected 
mRNA and protein abundances in human liver. Elec- 
trophoresis IS. 533-531._ 
Anderson. N.L.. Esquer-Blasco. R., Hofmann, J.R, Anderson, 
N.G.. 1991. A two-dimensional gel database of rat liver 
proteins useful in gene regulation and drug effects studies 
Electrophoresis 12, 907-930. 
Anderson, L., Steele, V.K., Kelloff. GJ.. Sharma. S., 1995. 
Effects of oltipraz and related chemoprevention com- 
pounds on gene expression in rat liver. J. Cell. Biochcm 
Suppl. 22. 108-116. 
Anderson. N.L., Esquer-Blasco, R.. Richardson. F., Foxwor- 
ihy. P.. Eacho, P., 1996. The effects of peroxisome prolifer- 
ators on protein abundances in mouse liver. Toxicol App] 
Pharmacol. 137, 75-89. 
Arce. A., Aicher, L, Wahl, D.. Esquer-BIasco, R., Anderson, 
N.L., Cordier, A., Steiner, S., 1998. Changes in the liver 
proteome of female Wistar rais treated with the hypo- 
glycemic agent SDZ PGU 693. Life Sci. 63, 2243-2250 



S, Sterner, NX. Anderson / Toxicology Letters 112-U3 (2000) 467-471 



Chee, M.. Yang, R.. HubbeU, E, Bemo, A., Huang, X.C 
Stem, D., Winkler, J., Lockhart, DJ., Morris, M.S.,' 
Fodor. S.P.. 1996. Accessing genetic information with 
high-density DNA arrays. Science 274, 6J0-614. 
Doherty, N.S., Littman, B.H., Reilly. K., SwindelJ, A.C Buss 
J.. Anderson, N.L., 1998. Analysis of changes in acuie- 
phase plasma proteins in an acute inflammatory response 
and m rheumatoid arthritis using two-dimensional gel eiec 
trophoresis. Electrophoresis 19, 355-363. 
Fodor, S.P., Read, J.L., Pirrung, M.C, Stryer. L., Lu, A.T 
Solas, D., 1991. Light^irected, spatially addressable paral- 
lel chemical synthesis. Science 251, 767-773. 
Mann, M., Hojrup, P., RoepsdorfT, R, 1993. Use of mass 
spectrometric molecular weight information to identify 
protems m sequence databases. BioJ. Mass Spectrom 
338-345. 

Richardson, F.C., Strom, S.C, Copple, D.M., Bendele R A 
Probst, G.S., Anderson, N.L., 1993. Comparisons of 
protein changes m human and rodent hepatocytes induced 
by the rat-specific carcinogen, methapyrilene. Elec- 
trophoresis 14, 157-161. 



471 

Schena, M., Shalon, D., Davis, R.W., Brown. P.O., 1995 
Quantitative monitoring of gene expresssion patterns with 
a complementary DNA microarray. Science 251, 467-470 

Shalon, D., Smith. S.J.. Brown, P.O., 1996. A DNA microar- 
ray system for analyzing complex DNA samples using 
two-color fluorescent probe hybridization. Genome Res. 6 
639-645. * 

Sieiner, S., Wahl. D., Mangold, B.LK., Robison. R Rayr- 
nackers, J., Meheus, L., Anderson, N.L., Cordier A 
1996a. Induction of the adipose differentiation-related 
protein m hver of etomoxir treated rats. Biochem Biophvs 
Res. Commun. 218, 777-782. 
Steiner, S.. Aicher. L, Raymackers, J., Meheus. L.. Esquer- 
Blasco. R., Anderson, L, Cordier, A., 1996b. Cyclosporine 
A mediated decrease in the rat renal calcium binding 
protein calbindin-D 28 kDa. Biochem. Phannacol M 
253-258. . * 

Wiikins. M.R., Gasteiger, E., Sanchez, J.C. Appel RD 
Hochstrasser, D.F., 1996. Protein identification with se-' 
quence tags. Curr. Biol. 6, 1543-1544. 



-A .1 



\X/nrl<<;hnp Summary 



Docket No.: PC-0044 CIP 
USSN: 09/895,686 
Ref. No. 4 of 6 



Application of DNA Arrays to Toxicology 

John C, Rockett and David J. Dix 

Reproductive Toxicology Division, National Health and Environmental Effects Research Laboratory, U.S. Environmental Protection 
Agency, Research Triangle Parle, North Carolina, USA 



DNA array technology makes it possible to rapidly genotype individuals or quantify tbe expression 
of thousands of genes on a single filter or glass slide, and holds enormous potential in toxicologic 
applications. This potential led to a U.S. Environmental Protection Agency-spoosored workshop 
tided "Application of Microarrays to Toodcolog/' on 7-^ January 1999 in Res«rch Triangle Paik» 
North Carolina. In addition to providing state-of-the-art information on the application of DNA or 
gene nucroarrays, the workshop catalysed the formation of several collaborations, committees^ and 
user's groups throughout the Research Triangle Park area and beyond. Potential application of 
microarrays to toxicologic research and dsk assessment include genomerwide expression analyses to 
identify gene-e]q>ression networks and toxicant-specific signatures that can be used to define mode 
of action, for exposure assessment, and for environmental monitoring. Arrays may also prove useful 
for monitoring genetic variabiHty and its relationship to toxicant susceptibility in human popula- 
tions. Key woriis: DNA. arrays, gene arrays, microarrays, toxicology. Environ Health Perspect 
107:681^85(1999). [Online 6 July 1999] 
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Decoding the genetic blueprint is a dream that 
ofiFcrs manifold Returns in terms of understand- 
ing how o^^Liiisnis develop and function in an 
often hostile environment. With the rapid 
advances in molecular biology over the last 30 
years, the dream has come a step closer to reali- 
ty. Molecular biologists now have the ability to 
elucidate the composition of any genome. 
Indeed, almost 20 genomes have already been 
sequenced and more than 60 are currently 
under way. Foremost among these is the 
Human Genome Mapping Projeo. However, 
the genomes of a number of conunonly used 
laboratory species are also under intensive 
investigation, including yeast, Arabidopsis, 
maize, rice, zebra fish, mouse, rat, and dog. It 
is widely expeaed that the completion of such 
programs will facilitate the development of 
many powerful new techniques and approach- 
es to diagnosing and creating genetically and 
environmentally induced diseases which afflia 
mankind. However, the vast amount of data 
being generated by genome mapping will 
require new high-throughput technologies to 
investigate the function of the millions of new 
genes that are being reported. Amoi^ the most 
widely heralded of the new functional 
genomics technologies are DNA arrays, which 
represent perhaps the most anticipated new 
molecular biology technique since polymerase 
chain reaction (PGR). 

Airrays enable the study of literally thou- 
sands of genes in a single experiment. The 
potential importance of arrays b enormous and 
has been hi^ghted by the recent publication 
of an entire Nature Genetia supplement dedi- 
cated to the technology (/). Despite this huge 
surge of interest, DNA arrays are still litde used 
and largely unproven, as demonstrated by the 
high ratio of review and press articles to actual 
data papers. Even so, the. potential they offer 



has driven venture capitalists into a ficnzy of 
investment and many new companies are 
springing up to claim a share of this rapidly 
developing market 

The U.S. Environmental Protection 
Agency (EPA) is interested in applying DNA 
array technology to ongoing toxicologic stud- 
ies. To learn more about the current state of 
the technology, the Reproductive Toxicology 
Division (RTD) of the National Health and 
Environmental Effects Research Laboratory 
(NHEERL; Research Triangle Park, NC) 
hosted a workshop on "Application of 
Microarrays to Toxicology" on 7-8 January 
1999 in Research Triangle Park, North 
Carolina. The workshop was organized by 
David Dbc Robert Kavlock, and John Rockett 
of the RTD/NHEERL. Twenty-two intra- 
mural and extramural scientists from govern- 
ment, acadenua, and industry shared informa- 
tion, data, and opinions on the current and 
future applications for this exciting new tech- 
nology. The workshop had more than 1 50 
attendees, including researchers, students, and 
administrators firom the EPA, the National 
Institute of Enviroimiental Health Sciences 
(NIEHS), and a number of other establish- 
ments from Research Triangle Park and 
beyond Presentations ranged from the tech- 
nology behind array production through the 
sharing of actual experimental data and projec- 
tions on the foture importance and applica- 
tions of arrays. The information contained in 
the workshop presentations should provide aid 
and insight into arrays in general and their 
apphcation to toxicology in particular. 

Array El ments 

In the context of molecular biology, the word 
"array** is normally used to refer to a scries of 
DNA or protein elements firmly attached in 



a r^;ular panem to some kind of supportive 
medium. DNA array is often used inter- 
changeably with gene array or microarray. 
Although not formally defined, microarray is 
generally used to describe the higher density 
arrays typically printed on glass chips. The 
DNA elements that make up DNA arrays 
can be oligonucleotides, partial gene 
sequences, or full-length cDNAs. Comparues 
ofTering pre-madc arrays that contain less 
than fidl-length clones normally use r^ons 
of the genes which are specific to that gene to 
prevent false positives arising through cross- 
hybridization. Sequence verification of 
cDNA done identity is necessary because of 
errors in identifying specific clones from 
cDNA libraries and databases., Premade 
DNA arrays printed on membranes arc cur- 
rendy or imminently available for human, 
mouse, and rat. In most cases they contain 
DNA sequences representing several thou- 
sand different sequence clusters or genes as 
delineated throi^ the National Center for 
Biotechnology Information UniCjcne Project 
(i). Many of these diflferent UniGcne clusters 
(putative genes) are represented only by 
expressed sequence tags (ESTs). 

Array Printing 

Arrays are typically printed on one of two 
types of support matrix. Nylon membranes 
are used by most off-the-shelf array providers 
such as Clontech Laboratories, Inc. 
(Palo Alto, CA), Genome Systems, Inc. (St. 
Louis, MO), and Research Genetics, Inc. 
(Huntsville, AL). Microarrays such as those 
produced by AfJymetrix, Inc. (Santa Clara, 
CA), Incyte Pharmaceuticals, Inc. (Palo Alto, 
CA), and many do-it-yourself (DIY) arraying 
groups use glass wafers or slides. Although 
standard microscope slides may be used, they 
must be preprepared to facilitate sticking 
of the DNA to the glass. Several different 
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coatings have been successfully used, includ- 
ing silane and lysine. The coating of slides 
can easily be carried out in the laboratory, 
but many prefer the conveiiience of precoatcd 
slides available bom suppliers. 

Once the support matrix has been pre- 
pared, the DNA dements can be applied by 
several methods. Affymetrix, Inc., has dcvd- 
opcd a unique photolithographic technology 
for attaching o%)nudeoudes to glass wafers. 
More commonly. DNA is applied by cither 
noncontaa or contaa printing. Noncontaa 
printers can use thermal, solenoid, or piezodec- 
tric technology to spray aliquots of solution 
onto the support matrix and may be used to 
produce slide or membrane-based arrays. 
Cartesian Technologies, Inc. (Irvine, CA) has 
devdoped nQUAD technology for use in its 
PixSys printen. The system couples a syringe 
pump with the microsolenoid valve, a combi- 
nation that provides rapid quantitative dispens- 
ing of nanolitcr volumes (down to 4.2 nL) over 
a variable volume range. A different approach 
CO noiKX)ntaa printing uses a solid pin and ring 
combination (Genetic MicroSystcrris, Inc., 
Wobum, MA). This system (Figure 1) allows a 
broader range of sample, induding cell suspen- 
sions and partictilates, because the printing 
head cannot be blocked up in the same way as 
a spray nozzle. Fluid transfer is controlled in 
this system primarily by the pin dimensions 
and the force of deposition, although the 
nature of the support matrix and the sample 
will also affea transfer to some d^rec. 

In contaa printing, die pin head is dipped 
in the sample and then touched to the support 
matrix to deposit a small aliquot. Split pins 
were one of the first contaa-printing devices 
to be reported and are the suggested format 
for DIY arraycrs, as described by Brown (3). 
Split pins are small metal pins with a precise 
groove cut vertically in the middle of the pin 
rip. In this system, 1-48 split pins arc posi- 
tioned in the pin-head. The split pins work by 
simple capillary action, not unlike a fountain 
pen — ^when the pin heads are dipped in the 
sample, liquid is drawn into the pin groove. A 
small (fixed) volume is then deposited each 
time the split pins are gently touched to 
the support matrix. Sample (100-500 pL 
depending on a variety of parameters) can be 
deposited on multiple slides before refilling is 
required, and array densities of > 2,500 
spots/cm^ may be produced. The deposit vol- 
ume depends on die split size, sample fluidi- 
ty, and the speed of printing. Split pins are 
relativdy simple to produce and can be made 
in-house if a suitable machine shop is avail- 
able. Alternatively, they can be obtained 
direcdy firom companies such as TdeChcm 
International, Inc (Sunnyvale, CA). 

Irrespective of their source, printers 
sbould be run through a preprint sequence 
prior to producing the actual experimental 



arrays; the first 100 or so spots of a new nm 
tend to be somewhat variable. Faaors eflfea- 
ing spot reproducibility include slide treat- 
ment homogeneity, sample differences, and 
instnmient errors. Other faaors that come 
into play indude dean ejection of the drop 
and clogging (nQUAD printing) and 
mechanical variations and long-term alter- 
ation in print-head surface of solid and split 
pins. However, with careful preparation it is 
possible to get a coefficient of variance for 
spot reprodudbility bdow 10%. 

One potential printing problem is sample 
carryover. Repeated washing, blotting, and 
drying (vacuum) of print pins between samples 
is normally effective at reducing sample carry- 
over to n^igible amoimts. Printing should 
also be carried out in a controlled environ- 
ment. Humidified chamben are available in 
which to place printers. These help prevent 
dust contamination and produce a uniform 
drying rate, which is important in determining 
spot size, quality, and reprodudbility. 

In summary, although several printing 
technologies are. available, none are par- 
ticularly outstanding and the bottom line 
is that they are still in a rdatively eady stage 
of evolution. 

Array Hybridization 

The hybridization protocol is, practically 
speaking, rdatively straightforward and those 
with previous experience in blpning should 
have little difficulty. Array hybridizations 
are, in essence, reverse Southern/Northern 
blots — ^instead of applying a labded probe to 
the target population of DNA/RNA, the 
labded population is applied to the probe(s). 
With membrane-based arrays, the control and 
treated mRNA populations are normally con- 
verted to cDNA and labded with isotope (e.g., 
35?) in the process. These labded populations 
are then hybridized independendy to paralld 
or serial arrays and the hybridizatbn sigtial is 
deteaed with a phosporimager. A less com- 
monly used alternative to radioacth^ probes is 
enzymatic detection. The probe may be 
biorinylated, haptenylated, or have alkaline 
phosphatasc/horseradish peroxidase attached. 
Hybridization is deteaed by enzymatic reac- 
tion yidding a color reaction (4). Differences 
in hybridization signals can be detected by eye 
or, more accuratdy, with the hdp of digital 
imaging and commercially available software. 
The labeling of the test populations for slide- 
based microarrays uses a slightly different 
apptoacL The probe typically consists of two 
samples of polyA* RNA (usually from a treated 
and a control population) that are converted to 
cDNA; in the process each is labded with a 
different fluot. The independently labded 
probes are then mixed together and hybridized 
to a sii^e microarray slide and the resulting 
combined fluorescent signal is scarmed After 




Rgure 1. Genetic Microsystems (Wobum, MA) pin 
ring system for printing arrays. The pin ring com- 
bination consists of a circular open ring oriented 
parallel to the sample solution, with a vertical pin 
centered over the ring. When the ring is dipped 
into a solution and lifted, it withdraws an aliquot 
of sample held by surface tension. To spot the 
sample, the pin is driven down through the ring 
and a portion of the solution is transferred to the 
bottom of the pin. The pin continues to move 
downward until the pendant drop of solution 
makes contact with the underlying surface. The 
pin is then lifted, and gravity and surface tension 
cause deposition of the spot onto the array. 
Figure from Flowers et al. ( 14), with permission 
from Genetic Microsystems, 

normalization, it is possible to determine the 
ratio of fluorescent signals from a single 
hybridization of a slide-based microarray. 

cDNA derived from control and treated 
populations of RNA is most commonly 
hybridized to arrays, although subtractive 
hybridization or differential display reactions 
may also be used. Fluorophore- or radiola- 
bded nudeoddes are directly incorporated 
into the cDNA in the process of converting 
RNA to cDNA Alternatively, 5' cnd-labded 
primers rnay be used for cDNA synthesis. 
These are labeled with a fluorophore for 
dirca visualization of the hybridized array. 
Alternatively, biotin or a hapten may be 
attached to the piimer, in which case fluor- 
labeled streptavidin or antibody must be 
applied before a signal can be generated. The 
most conunonly used fluorophores at present 
are cyanine (Cy)3 and Cy5 (Amersham 
Pharmacia Biotech AB, Uppsala, Sweden). 
However, the rdative expense of these fluo- 
rescent conjugates has driven a search for 
cheaper alternatives. Fluorescein, rhodamine, 
and Texas red have all been used, and 
companies such as Molecular Probes, Inc. 
(Eugene, OR) are developing a series of 
labded nudcorides with a wide range of exd- 
tarion and emission spectra which may prove 
to function as wdl as the Cy dyes. 
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Table 1. Advantages and disadvantages of different microarray scanning systems. 



Nonconfocal laser scanner 


Advantages 
Disadvantages 


Fev;^ moving parts 

Fast scanning of bright 
samples 

Less appropriate for dim 
samples 

Optical scatter can limit 
performance 


Relatively simple optics 

Low light collection efficiency 
Background artifacts not rejected 
Resolution typically low 


Small depth of focus reduces 
artifacts 

May have high light collection 
efficiency 

Small depth of focus requires 
scanning precision 



Analysis of ONA Microarrays 

Membrane-based arrays are normally analyzed 
on film or with a phosphorimager, whereas 
chipbased arrays require more specialized scan- 
ning devices. These can be divided into three 
main groups; the charge-coupled device camera 
systems, die nonconibcai laser scanners, and the 
confbcal laser scarmers. The advantages and dis- 
advantages of each system are listed in Table 1. 

Because a typical spot on a microarray can 
contain > 10^ molecules, it is clear that a large 
variation in signal strength may occur. 
Current scanners cannot work across this 
many orders of magnitude (4 or 5 is more typ- 
ical). However, the scanning parameters can 
normally be adjusted to coUea more or less 
signal, such that two or three scans of the same 
array should permit the detection of rare and 
abundant genes. 

When a microarray is scarmed, the fluores- 
cent images are captured by software normally 
included with the scaimer. Several commercial 
supplien provide additional software for quan- 
tifying array images, but the software tools are 
constandy evolving to meet the developing 
needs of researchers, and it is prudent to 
define one's own needs and clarify the cxaa 
capabilities of the software before its purchase. 
Issues that should be considered include the 
following: 

• Can the software locate oflfeet spots? 

• Can it quantitatc across irregular hybridiza- 
tion signals? 

• Can the arrayed genes be programmed in for 
easy identification and location? 

• Can the software connea via the Internet to 
databases containing further information on 
the gene(s) of interest? 

One of the key issues raised at the work- 
shop was the sensitivity of microarray technol- 
ogy. Experiments by General Scanning, Inc. 
(Watertown, MA), have shown that by using 
the Cy dyes and their scarmer, signal can be 
detected down to levek of < 1 fluor molecule 
per square micrometer, which translates to 
detecting a rare message at approximately one 
copy per cell or less. 

Array Applications 

Although arrays are an emerging technology 
certain to undergo improvement and 
alceration,«thcy have already been ^plied use- 
fiiUy to a number of model systems. Arrays are 
at their most powerful when they contain the 
entire genome of the species they are being 
used to study. For this reason, diey have strong 
suppon among researchers utilizing yeast and 
Qunorhahditis eUgans (5). The genomes of 
both of these spedes have been sequenced and, 
in the case of yeast, deposited onto arrays for 
examination of gene expression {6,7^, With 
both of these species, it is relativcfy easy to 
perturb individual gene expression. Indeed, C 



CCD, charge-coupled device. 
From Kawasaki ( 75). 

eUgans knockouts can be made simply by 
soalung the worms in an antisense solution of 
the gene to be knocked out. 

By a process of systematic gene disrup- 
tion, it is now possible to examine the cause 
and effect relationships between different 
genes in these simple organisms. This kind of 
approach should help elucidate biochemical 
pathways and genetic control processes, 
deconvolu^e polygenic interactions, and 
define the architecture of the cellular network. 
A simple case study of how this can be 
achieved was presented by Butow [University 
of Texas Southwestern Medical Center, 
Dallas, TX (Figure 2)]. Although it is the 
phenotypic result of a single gene knockout 
that is being examined, the effect of such 
perturbation will almost ahvays be polygenic 
Polygenic interactions will become increasing- 
ly important as researchers begin to move' 
aviray fi-om sbgle gene systems when examin- 
ing the nature of toxicologic responses to 
external stimidi. This is especially important 
in toxicology because the phcnotype pro- 
duced by a given environmental insult is 
never the result of the action of a single gene; 
rather, it is a complex interaction of one or 
multiple cellular pathways. Phenomena such 
as quantitative trait (the continuous variation 
of phcnotype), epistasis (the eflfea of alleles of 
one or more genes on the expression of other 
genes), and penetrance (proportion of indi- 
viduals of a given genotype that display a par- 
ticular phenotypc) will become increasingly 
evident and important as toxicologists push 
toward the ultimate goal of matching the 
responses of individuals to different 
environmental stimuli. 

Analysis of the transcriptome (the expres- 
sion level of all the genes in a given cell popula- 
tion) was a use of arrays addressed by several 
speakers. Unfortunately, current gene nomen- 
clature is often confusing in diat single genes 
are allocated multiple names (usually as a result 
of independent discovery by different laborato , 
ties), and there was a call for standardization of 
gene nomenclature. Nevertheless, once a tran- 
scriptome has been assembled it can then be 
transferred onto arrays and used to screen any 
chosen system. The EPA MicroArray 
Consortium (EPAMAQ b assembling testes 



transcriptomes for human, rat, and mouse. In a 
slighdy diflS^rent approach, Nuwaysir et al. (fiji 
describes how the NIEHS assembled what is 
effectively a "toxicological transcriptome" — a 
library of human and mouse genes that have 
previously been proven or implicated in 
responses to toxicologic insults. Qontech 
Laboratories, Inc. (Palo Alto, CA), has begun a 
similar process by developing stress/toxicology 
filter arrays of rat, mouse, and human genes. 
Thus, rather than being tissue or cell specific, 
these stress/toxicology arrays can be used across 
a variety of model systems to look for alter- 
ations in die expression of toxicologically 
important genes and define the new field of 
toxicogfnomics. The potential to identify toxi- 
cant fanulies based on tissue- or cell-specific 
gene e]q}ression could revolutionize drug test- 
ing. These molecular signatures or fingerprints 
could not only point to the possible 
toxicity/carcinogenicity of newly discovered 
compounds (Figure 3), but also aid in elucidat- 
ing dicir mechanism of action through identifi- 
cation of gene expression networics. By extcn- 
su)n, such s^patures could provide easily iden- 
tifiable biomarkers to assess the degree, time, 
and nature of exposure. 

DNA arrays are primarily a tool for exam- 
iiung differential gene expression in a given 
model. In this context they are referred to as 
closed systems because they lack the ability of 
other difierential expression technologies, e.g., 
differential display and subtractive hybridiza- 
tion, to dctea previously unknown genes not 
present on the array. This would appear- to 
limit the power of DNA arrays to the imaginar 
tions and preconceptions of the researcher in 
selecting genes previously characterized and 
thought to be involved in the model system. 
However, die various genome sequencing pro- 
jects have created a new category of 
sequence — the EST — that has partially molli- 
fied this deficiency. ESTs arc cDNAs expressed 
in a given tissue that, although they may share 
some degree of sequence similarity to previous- 
ly charaaerized genes, have not been assigned 
specific genetic identity. By incorporating EST 
doncs into an array, it is possible to monitor 
the expression of these unknown genes. This 
can enable the identification of previously 
uncharacterized genes that may have biologic 
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significance in the model system. Filter arrays 
from Research Genetics and slide arrays from 
Incyte Pharmaceuticals both incorporate large 
numbers of ESTs from a variety of species. 

A further use of miooarrays is the identifi- 
cation of single nucleotide polymorphisms 
(SNPs). These genomic variations are abun- 
dant — ^they occur approximately every 1 kb or 
so — ^and arc the basis of restriction fragment 
length polymorphism analysis used in forensic 
analysis. Afiymetrix, Inc, designed chips that 
contain mulripie repeats of the same gene 
sequence. Each position is present with all four 
possible bases. After the hybridization of the 
sample, the d^ree of hybridization to the dif- 
ferent sequences can be measured and the exact 
sequence of die target gene deduced. SNPs are 
thought to be of vital importance in drug 
metabolism and toxicology. For example, sin- 
gle base difFcrcnccs in the r^;ulatory region or 
active site of some genes can account for huge 
differences, in the activity of that gene. Such 
SNPs are thoi^t to explain why some people 
are able to metabolize certain xenobiorics bet- 
ter than others. Thus, arrays provide a further 
tool for the toxicologist investigating the 
nature of susceptible subpopulations and toxi- 
cologic response. 

There are still many wrinkles to be ironed 
out before arrays become a standard tool for 
toxicologists. The main issues raised at the 
woricshop by those with hands-on experience 
were the following: 

• Expense: the cost of purchasing/contracting 
this technology is still too great for many 
individual laboratories. 




Figure 2. Potential effects of gene knockout within 
posrtivelY and negatively regulated gene expression 
networks, is limiting in wild type for expression of 
i^. {A) A simple, two-component linear regulatory 
network operating on gene i^. where iy is a positive 
effector of and is either a positive or negative 
effector of ly This network could be deduced by 
examining the consequence of (6) deleting on the 
expression of and ij, where the expression of 
would be decreased or increased depending on 
whether was a positive or negative regulator. 
These and other connected components of even 
greater complexity could be revealed by genome- 
wide expression analysis. From Butow ( ik. 



* Qones: the logistic of identifying, obtaining, 
and maintaining a set of nomedundant, non- 
oontaminated, sequence-verified, spedes/cell/ 
tissue/field-specific clones. 

» Use of inbred strains: where whole-organism 
models are being used, the use of inbred 
strains is important to reduce the potentially 
confusing effeccs of the individual variation 
typically seen in outbred populations. 

» Probe: the need for relatively large amounts 
of RNA, which limits the type of sample 
(e.g., biopsy) that can be used Also, different 
RNA extraction methods can give different 
results. 

Specifidcy: the ability to discriminate accu- 
; ratdy between dosdy related genes (e.g., the 
; cytochrome p450 &mily) and splice variants, 
t Quantitation: the quantitation of gene 
\ expression using gene arrays is still open to 
debate. One reason for this is the different 
incorporation of the labeling dyes. However, 
the main difficulty lies in knowing what to 
normalize against. One option is to include a 
large number of so-called housekeeping genes 
in the array. However, the expression of these 
genes often change depending on the tissue 
and the toxicant, so it is necessary to charac- 
terize the expression of these genes in the 
model system before utilizing them. This is 
clearly not a viable option when screening 
multiple new compounds. A second option 
is to include on the array genes from a nonre- 
lated S[)edes (e.g., a plant gene on an animal 
array) and to spike the probe with synthetic 
RNA(s) complementary to the gene(s). 

* Reproducibility: this is sometimes question- 
able, and a figure of approximately two or 
three repeats was used as the minimum num- 
ber required to confirm initial findings. 



Again, however, most people advocated the 
use of Northern blots or reverse transcriptase 
PGR to confirm findings. 

• Sensitivity: concerns were voiced about the 
number of target molecules that must be pre- 
sent in a sample for them to be deteaed on 
the array. 

• Efficiency: reproducible identification of 1.5- 
to 2-fbld differences in expression was repon- 
ed, although the number of genes that 
undergo this level of change and remain 
undeteaed is open to debate. It is important 
that this level of detection be ultimately 
achieved because it is commonly perceived 
that some imponant transcription Actors 
and their r^;ulators respond at such low lev- 
els. In most cases, 3- to 5-fbld was the mini- 
mum change that most were happy to 
accept. 

• Bioinfbrmatics: perhaps die greatest concern 
was how to accurately interpret die data with 
the greatest accuracy and efficiency. The 
biggest headache is trying to identify net- 
works of gene expression that are common to 
diderent treatments or doses. The amount of 
dara from a single experiment is huge. It may 
be that, in the future, several groups individ- 
ually equipped with specialized software algo- 
rithms for studying their favorite genes or 
gene systems will be able to share the same 
hybridized chips. Thus, arrays could usher in 
a new perspeaive on collaboration and the 
sharing of data. 

EPAMAC 

Perhaps the main reason most scientists are 
unable to use array technology is the high cost 
involved, whether buying off-the-shelf mem- 
branes, using contract printing services, or 



Test ctm^»»ind 1 



Toxicant family 



Qxl6m ttrcssors 



PolyCYCfic aroniadc hydrocoitons 




Figure 3. Gene expression profiles — also called fingerprints or signatures — of known toxicants or toxi- 
cant families may, in the future, be used to identify the potential toxicity of new drugs, etc. In this exam- 
ple, the genetic signature of test compound 1 is identical to that of known peroxisome prolrferators, 
whereas that of test compound 2 does not match any known toxicant family. Based on these results, test 
cpmpound 2 would be retained for further testing and test compound 1 would be eliminated. 
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producing chips in-housc. In view of this, 
researchers at the RTD/NHEERL initiated 
the EPAMAC. This consortium brings 
together scientists from the EPA and a num- 
ber of extramural labs with the aim of devel- 
oping microarray capability through the shar- 
ing of resources and data. EPAMAC 
researchers are primarily interested in the 
developmental and toxicologic changes seen 
in testicular and breast tissue, and a portion 
of the workshop was set aside for EPAMAC 
members to share their ideas on how the 
experimental application of microarrays could 
facilitate their research. One of the central 
areas of interest to EPAMAC members is the 
effect of xenobiotics on male fertility and 
reproductive health. Of greatest concern is 
the effea of exposure durir^ critical periods 
of development and germ cell diflFerentiation 
(50, and how this may compromise sperm, 
counts and quality following sexual matura- 
tion (10), As well as spermatogenic tissue, 
there b also interest in how residual mRNA 
found in mature sperm (11) could be used as 
an indicator of previous xenobiotic effects (it 
is easier to obtain a semen sample than a tes- 
ticular biopsy). Arrays will be used to examine 
and compare the effect of exposure to heat 
and chemicals in testicular and epididymal 
gene expression profiles, with the aim of 
establishing relationships/associations 
between changes in developmental landmarks 
and the effects on sperm count and quality. 
Cluster, pattern, and other analysis of such 
data should help identify hidden relationships 
between genes that may reveal potential 
mechanisms of action and uncover roles for 
genes with unknown functions. 

Summary 

The full impaa of DNA arrays may not be 
seen for several years, but the interest shown at 
this re^onal workshop indicates the high level 
of interest that they foster. Apan from educat- 
ing and advertising the various technologies in 
this field, this workshop brought together a 
number of researchers from the Research 
Triangje Park area who arc already using DNA 
arrays. The interest in sharing ideas and e:q)eri- 
ences led to the initiation of a Triangle array 
user's group. 
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Array technology is still in its infancy. This 
mcaiis diat the hardware is still improving and 
therej is no current consensus for standard pro- 
cediJres, quantitation, and interpretation. 
Consistency in spotting and scanning arrays is 
not yet optimized, and this is one of the most 
criti(^al requirements of any experiment. In 
additjion, one of the dark r^ons of array tech- 
nology — strife in the courts over who owns 
whaqportions of it — has fiirther muddled the 
future and is a potential barrier toward the 
development of consensus procedures. 

Perhaps the greatest hurdle for the applica- 
tion of arrays is the actual interpretation of 
data. No specialists in bioinfbrmatics attended 
the Mrorkshop, largely because they are rare and 
because as yet no one seems clear on the best 
method of approaching data analysis and inter- 
pret^on. Cross-referencing results from mul- 
tiple jexperimenc (time, dose, repeats, diflferent 
anixi^, diSerent spedes) to identify common- 
ly ei^ressed genes is a great challenge. In most 
casesi we are still a long way from understand- 
ing how the expression of gene X \s related to 
the ^ression of gene Y, and ordering gene 
eq)rission to delineate causal relationships. 

To the ordinary scientist in the typical lab- 
oratdiy, however, the most immediate prob- 
lem is a lack of affordable instrumentation. 
One! can purchase premade membranes at 
relatively affordable prices. Although these 
may I be useful in identifying individual genes 
to pursue in more detail using other methods, 
the ^umbers that would be required for even a 
small routine toxicology experiment prohibit 
this as a truly viable approach. For the toxicol- 
ogistti there is a need to carry out multiple 
experiments — dose responses, time curves, 
mul^ple animals, and repeats. Glass-based 
DN^ arrays are most attractive in this context 
they can be prepared in large batches 
fronj the same DNA source and acconmio- 
date control and treated samples on the same 
chip] Another problem with current oflF-the- 
arrays is that they often do not contain 
one pr more of the particular gpnes a group is 
interested in. One alternative is to obtain 
r produce a set of custom clones and 
contraa printing of membranes or slides 
out by a company such as Genomic 
Solutions, Inc (Ann Arbor. MI). This approach 
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is less expensive than laying out capital for 
one's own entire system, although at some 
point it might make economic sense to print 
one s own arrays. 

Finally, DNA arrays are currendy a team 
efibrt. They are a technolt^ that uses a wide 
range of skills including engineering, statistics, 
molecular, biology, chemistry, and bioinfbr- 
matics. Because most individuals are skilled in 
only one or perhaps two of these areas, it 
appears that success with arrays may be best 
expeaed by teams of collaborators consisting 
of individuals having each of these skills. 

Those considering array applications may 
be amused or goaded on by the following 
quote from Fortune magazine (12): 

Microproccsson have reshaped our economy, . 
spawned vast fortunes and changed the way we live. 
Gene chips could be even bigger. 

Although this comment may haye been 
designed to excite the imagirution rather than 
accurately reflea the tmth, it is fair to say that 
the age of frmctional genomics is upon us. 
DNA arrays look set to be an important tool in 
this new age of biotechnology and will likely 
contribute answers to some of toxicology's 
most fimdamehtal questions. 
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Subject: RE: [Fu d: Toxicolog}* Chip] ^ 
Date: Mon. 3 Jul 2000 08:09:45 -0400 
From: "Afshari.O-mhia" <afshan@'niehs.nih.sov> 
To: "'Diana Hamlei-Cox'" <dianahc@'incvie.com> 

Vo-^ car. see zhe list of clones that we have o-, o— ---p a- 

nrtp: aa-ue-.-i er^.r.ih.=c-' rjiss cuesr r- ^^ esrch.rf ' 

he selected a suaset ot genes <2 o6oK> that we be- le^-e eS — 

respor^se and iaasic cell-^lar processes a-d ariri*^ Z 11- «rZ-"" " 

t-s. We have included a set of con?r^ ««ef (Bo!? -^1- C^*" *V ^-'V 

the IvTiG?.: because they did not changr^crSS" ll'gl ^^'oH-fv 

!!^-:::!:-:?.: ^---ve found .ha-. so« ;MLse*fl«e8*;?k--e 

signficant-y after tox treatments and are in the p-ocess c' "ool'f-- * . 

variation of each of these 80- genes across oJr e^e"^"i 

^-"w"?? = cnanginQ and being updatel ;;;d we iope -a- c - 

da.a vU .eao us to wnat tne toxchi? should reallv be. 

: nope this answers your question. ■ 

Cindy Afshari 
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> J'rofli: Diana Hamlez-Cox 

> Sent: Monday, June 26. 2000 8:52 PH 

> To: a fsheriSniehs .niii.gov 

> Subject; [Fwd: Toxicology Chip} 

> Dear Dr. Afsharz. 
> 

> 

> Canj«?u help, me in chis matzer? 1 don't nme*d ra Jrnnw -h- ^ _ 

> are i»i;:g L-sed. e.g.. GPCRs fmore specific?;, ior chMi^". 

> Diana Hamlet-Cox 
> 

> Original Message r 

> Subject; Toxicology Chip 

> Daze: Hon. 19 Jun 2000 18:31:48 -0700 

> from; Diana Hamlet -Cox <dianahcffincvte.com> 

> Organizatxon: Incyte PharToaceuzicals 

> To: grigg&ni ehs.nih.gov 
> 

> Dear Colleague: 

> ■ * ■ 

> Z^am doing literature research on the use of expressed genes as 

> ?^'^oSr:S::d!^ T/'^-'^ "l' t^^^' Pres^^^eierL'Sred^-ebruary 

> k^nJi* ^ ^* "^^^ °^ ^^^^ - would 'ixe -o 

> resource X can access tor you could provide^r-Ja- 

> ^^tfjir* "•^''^ chac are on your H^n ToTctzp 

> sJlJt l ^' ^" Py-^^^'^r, X am interested in the crizeriaTsed to 
: !nc!uder?;;^L"J:^^^^^ ^"^^-^^^-^ "--^ seguen^es^^ 

> rhanJc you for your assistance in =ht* r-^.— 



> This emdil messmge zs for zhm s ie use of zhe lr.zer.ded rer^p^e: 

> may conrai.- czr.fidezzial and privileged irforrrarior: susjecr 

> azzomey'-elier.z privilege. Any unaurijcrirec rei-e*-, use. 

> diszribiizioz is prohiiirec. Tf you are noz zhe i: 

> please conrar: zhe sender ifv' reply esail and deszroy 

> original 




> 
> 
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Proteomics: a major new 
technology for the drug 
discovery process 

Martin J. Page, Bob Amess, Christian Rohlff, Colin Stubberfield 
and Raj Parekh 



Proteomics is a new enabling technology that is being 
integrated into the drug discovery process. This will 
facilitate the systematic analysis of proteins across any 
biological system or disease, forwarding new targets 
and information on mode of action, toxicology and sur- 
rogate markers. Proteomics is highly complementary to 
genomic approaches in the drug discovery process and, 
for the first time, offers scientists the ability to integrate 
information from the genome, expressed mRNAs, their 
respective proteins and subcellular localization. It is ex- 
pected that this will lead to important new insights into 
disease mechanisms and improved drug discovery 
strategies to produce novel therapeutics. 

Among the major pharmaceutical and biotechnol- 
ogy companies, it is clearly recognized that the 
business of modern drug discovery is a highly 
competitive process. All of the many steps in- 
volved are inherently complex, and each can involve a 
high risk of attrition. The players in this business strive 
continuously to optimize and streamline the process; each 
seeking to gain an advantage at every step by attempting 
to make informed decisions at the earliest stage possible. 
The desired outcome is to accelerate as many key activities 
in the drug discovery process as possible. This should pro- 



duce a new generation of robust drugs that offer a high 
probability of success and reach the clinic and market 
ahead of the competition. 

There has been noticeable emphasis over recent years 
for companies to aggressively review and refine their 
strategies to discover new drugs. Central to this has been 
the introduction and implementation of cutting-edge 
technologies. Most, if not all, conipanies have now inte- 
grated key technology platforms that incorporate gen- 
omics, mRNA expression analysis, relational databases, 
high-throughput robotics, combinatorial chemistry and 
powerful bioinformatics. Although it is still early days to 
quantify the real impact of these platforms in clinical and 
commercial terms, expectations are high, and it is widely 
accepted that significant benefits' will be forthcoming. This 
is largely based on data obtained during preclinical studies 
where the genomic^ and microarray^'"* technologies have 
already proved their value. 

However, there are several noteworthy outcomes that re- 
sult from this. Many comments are voiced that scientists 
armed with these technologies are now commonly faced 
with data overload. Thus, in some instances, rather than 
facilitating the decision process, the accumulation of more 
complex data points, many with unknown consequences, 
can seem to hinder the process. Also, most drug compa- 
nies have simultaneously incorporated very similar compo- 
nents of the new technology platforms, the consequence 
being that it is becoming difficult yet again to determine 
where a clear competitive advantage will arise. Finally, in 
recent years, largely as a result of the accessibility of the 
technologies, there has been an overwhelming emphasis 
placed on genomic and mRNA data rather than on protein 
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Figure 1, Steps involved in analysing a biological sample by proteomics. MCI, molecular cluster index. 



analysis. It is important to remember that proteins dictate 
biological phenotype - whether it is normal or diseased — 
and are the direct targets for most drugs. 

Pr t omics: new technology for 
the analysis of proteins 

It is now timely to recognize that complementary technol- 
ogy in the form of high-throughput analysis of the total 
protein repertoire of chosen biological samples, namely 
proteomics, is poised to add a new and important dimen- 
sion to drug discovery. In a similar fashion to genomics, 
which aims to profile every gene expressed in a cell, pro- 
teomics seeks to profile every protein that is ' expressed 
However, there is added information, since proteomics can 
also be used to identify the post-translational modifications 
of proteins^, which can have profound effects on bio- 
logical function, and their cellular localization. Importantly, 
proteomics is a technology that integrates the significant 
advances in two-dimensional (2D) electrophoretic separa- 
tion of proteins, mass spectrometry and bioinformatics. 
With these advances it is now possible to consistently de- 
rive proteomes that are highly reproducible and suitable 
for interrogation using advanced bioinformatic tools. 

There are many variations whereby different laboratories 
operate proteomics. For the purpose of this review, the 



process used at Oxford GlycoSciences (OGS), which uses 
an industrial-scale operation that is integral to its drug dis-: 
covery work, will be described. The individual steps of 
this process, where up to 1000 2D gels can be run and 
analysed per week, are summarized in Fig. 1. The incom- 
ing samples are bar coded and all information relevant to 
the sample is logged into a Laboratory Information 
Management System (LIMS) database. There can be a wide 
range in the type of samples processed, as applicable to 
individual steps in the drug discovery pipeline, and these 
will be mentioned later. The samples are separated accord- 
ing to their charge (pi) in the first dimension, using iso- 
electric focusing, followed by size (MW) using SDS-PAGE 
in the second dimension. Many modifications have been 
made to these steps to improve handling, throughput and 
reproducibility. The separated proteins are then stained 
with fluorescent dyes which are significantly more sensi- 
tive in detection than standard silver methods and have a 
broader dynamic range. The image of the displayed pro- 
teins obtained is referred to as the proteome, and is digi- 
tally scanned into databases using proprietary software 
called ROSETTA™. The images are subsequendy curated, 
which begins with the removal of any artefacts, cropping 
and the placement of pI/MW landmarks. The images from 
replicate" images are then aligned and matched to one 
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another to generate a synthetic composite image. This is 
an important step, as the proteome is a dynamic situation, 
and it captures the biological variation that occurs, such 
that even orphan proteins are still incorporated into the 
analysis. 

By means of illustration. Fig. 1 shows the process 
whereby proteomes are generated from normal and dis- 
ease samples and how differentially expressed proteins are 
identified. The potential of this type of analysis is tremen- 
dous. For example, from a mammalian cell sample, in ex- 
cess of 2000 proteins can typically be resolved within the 
proteome. The quality of this is shown in Fig. 2, which 
shows representative proteomes from three diverse bio- 
logical sources: human serum, the pathogenic fungus 
Candida albicans and the human hepatoma cell line 
Huh7- 

Us f proteomics to identify 
disease specific proteins 

In most cases, the drug discovery process is initiated by 
the identification of a novel candidate target - almost al- 
ways a protein - that is believed to be instrumental in the 
disease process. To date, there is a variety of means 
whereby drug targets have been forthcoming. These in- 
clude molecular, cellular and genomic approaches, mostly 
centred upon DNA and mRNA analysis. The gene in ques- 
tion is isolated, and expression and characterization of its 
coded protein product - i.e. the drug target - is invariably 
a secondary event. 

With the proteomic approach, the starting point is at the 
other end of the 'telescope'. Here there is direct and im- 



mediate comparison of the proteomes from paired normal 
and disease materials. Examples of these pairs are: (1) pu- 
rified epithelial cell populations derived from human 
breast tumours, matched to purified normal populations of 
human breast epithelial cells, and (2) the invading patho- 
genic hyphal form of C. albicans, matched to the non- 
invading yeast form of C, albicans. When the proteome 
images from each pair are aligned, the Prpteograph™ soft- 
ware is able to rapidly identify those proteins (each refer- 
enced as having a unique molecular cluster index, or MCI) 
that are either unique, or those that are differentially ex- 
pressed. Thus, the Proteograph output from this analysis is 
both qualitative and quantitative. 

Proteograph analysis for a particular study can also be 
undertaken on any number of samples. For example, one 
might compare anything from a few to several hundred 
preparations or samples, each from a normal and disease 
counterpart, and have these analysed in a single 
Proteograph study. In this way, it is possible to assign 
strong statistical confidence to the data and in some in- 
stances to identify specific subpopulations within the input 
biological sources. ThisTeature.will become increasingly 
significant in the near future, and there is a clear synergy 
here whereby proteomics can work closely with pharma- 
cogenomic approaches to stratify patient populations and 
achieve effective targeted care for the patient. Whatever 
the source of the materials, the net output of Proteograph 
analysis is immediate identification of disease specific pro- 
teins. This is shown in Fig. 3, which shows the results of 
a proteograph obtained by comparing untreated human 
hepatoma cells with cells following exposure to a clinical 
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Figure 2. Representative proteomes obtained from (a) human serum, (b) the pathogenic fungus Candida albicans 
and(c) the human hepatoma cell line Huhl. 
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Figure 3- Table of differential protein expression 
profiles, referred to as a Rosetta Proteograph ™ 
between Huh 7 cells with and without the cytotoxic 
agent 5-FU. Bars are quantized and do not represent 
exact fold change values. 



cytotoxic agent. In this instance, only the top 20 differen- 
tially expressed MCIs are shown, but the readout would 
normally extend to a defined cut-off value, typically a two- 
fold or greater difference in expression levels, determined 
by the user. 

In a typical analysis involving disease and normal mam- 
malian material, in which each proteome would have 
"-2000 protein features each assigned an MCI, the proteo- 
graph might identify somewhere in the region of 50-300 
MCIs that are unique or differentially expressed. To capi- 
talize rapidly on these data, at OGS a high-throughput 



mass spectrometry facility coupled to advanced databases 
to annotate these MCIs as individual proteins is applied. As 
these are all disease specific proteins, each could represent 
a novel target and/or a novel disease marker. The process 
becomes even more powerful when a panel of features, 
rather than individual features, are assigned. The relevance 
of this is apparent when one considers that most diseases, 
if not all, are multifactorial in nature and arise from poly- 
genic changes. Rather than analysing events in isolation, 
the ability to examine hundreds or thousands of events 
simultaneously, as shown by proteomics, can offer real 
advantages. 

Identification and assignment of candidate targets 
The rapid identification and assignment of candidate tar- 
gets and markers represents a huge challenge, but this has 
been greatly facilitated by combining the recent advances 
made in proteomics and analytical mass spectrometry^. 
Using automated procedures it is now possible to annotate 
proteins present in femtomole quantities, which would de- 
pict the low abundance class of proteins. The process of 
annotation is similarly aided by the quality and richness of 
the sequence specific databases that are currently avail- 
able, both in the public domain and in the private, sector 
(e.g. those supplied by Incyte Pharmaceuticals). In this re- 
spect, the advances in proteomics have benefited consider- 
ably from the breakthroughs achieved with genomics. 

From an application perspective, cancer studies provide a 
good opportunity whereby proteomics can be instrumental 
in identifying disease specific proteins, because it is often 
feasible to obtain normal and diseased tissue from the same 
patient. For example, proteomic studies have been re- 
ported on neuroblastomas^^, human breast proteins from 
normal and tumour sources^ lung tumours''*, colon tu- 
mours'^ and bladder tumours'^. There are also proteomic 
studies reported within the cardiovascular therapeutic area, 
in which disease or response proteins are identified' ^''^. 

Genomic microarray analysis can similarly identify 
unique species or clusters of mRNAs that are disease spe- 
cific. However, in some instances, there is a clear lack of 
correlation between the levels of a specific mRNA and its 
corresponding protein (Ref. 19, Gypi, S.R et al., submit- 
ted). This has now been noted by many investigators and 
reaffirms that post-transcriptional events, including protein 
stability, protein modification (such as phosphorylation, 
glycosylation, acylation and methylation) and cell localiz- 
ation, can constitute major regulatory steps, Proteomic 
analysis captures all of these steps and can therefore pro- 
vide unique and valuable information independent from, 
or complementary to, genomic data. 
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Prot micsf r target validati n and signal transduc- 
ti n studies 

The identification of disease specific proteins alone is in- 
sufficient to begin a dnig screening process. It is critical to 
assign function and validation to these proteins by con- 
firming they are indeed pivotal in the disease process. 
These studies need to encompass both gain- and loss-of- 
function analyses. This would determine whether the activity 
of a candidate target (an enzyme, for example), eliminated 
by molecular/cellular techniques, could reverse a disease 
phenotype. If this happened, then the investigator would 
have increased confidence that a small-molecule inhibitor 
against the target would also have a similar effect. The 
proposal of candidate drug targets is often not a difficult 
process, but validating them is another matter. Validation 
represents a major bottleneck where the wrong decision 
can have serious consequences^^. 

Proteomics can be used to evaluate the role of a chosen 
target protein in signal transduction cascades directly rel- 
evant to the disease. In this manner, valuable information 
is forthcoming on the signalling pathways that are per- 
turbed by a target protein and how they might be cor- 
rected by appropriate therapeutics. Techniques that are 
well established in one-dimensional protein studies to in- 
vestigate signalling pathways, such as western blotting 
and immunoprecipitation, are highly suited to proteomic 
applicationis. For example, the proteomes obtained can be 
blotted onto membranes and probed with antibodies 
against the target protein or related signalling mol- 
ecules^^"-^^. Because proteomics can resolve >2000 pro- 
teins on a single gel, it is possible to derive important 
information on specific isoforms (such as glycosylated or 
phosphorylated variants) of signalling molecules. This will 
result in characterization of how they are altered in the 
disease process. Western immunoblotting techniques 
using high-affinity antibodies will typically identify pro- 
teins present at ~10 copies per cell (-1.7 fmol); this is in 
contrast to the best fluorescent dyes currently available 
that are limited to imaging proteins at 1000 or more 
copies per cell. The level of sensitivity derived by these 
applications will greatly facilitate interpretation of com- 
plex signalling pathways and contribute significantly to 
validation of the target under study. 

Immunoprecipitation studies 

Similarly, immunoprecipitation studies are another useful 
way to exploit the resolving power of proteomics^''-^^ 
this instance, very large quantities of protein (e.g. several 
milligrams) can be subjected to incubation with antibodies 
against chosen signalling molecules. This allows high-affin- 



ity capture of these proteins, which can subsequently be 
eluted and electrophoresed on a 2D gel to provide a high- 
resolution proteome of a specific subset of proteins. 
Detection by blot analysis allows the identification of ex- 
tremely small amounts of defined signalling molecules. 
Again, the different isoforms of even very low abundance 
proteins can be seen, and, very importantly, the technique 
allows the investigator to identify multiprotein complexes 
or other proteins that co-precipitate with the target protein. 
These coassociating proteins frequently represent sig- 
nalling partners for the target protein, and. their identifi- 
cation by mass spectrometry can lead to invaluable infor- 
mation on the signalling processes involved. 

The depth of signal transduction analysis offered by 
proteomics, and the utility for target validation studies, 
can be extended even further by applying cell fraction- 
ation studies^^^^. By purifying subcellular fractions, such 
as membrane, nuclear, organelle and cytosolic, it is: possi- 
ble to assign a localization to proteins of interest and to 
follow their trafficking in a cell. Enrichment of these frac- 
tions will also allow much higher representation of low 
abundance proteins on the proteome. Their detection by 
fluorescent dyes or imniunoblot techniques will lead to 
the identification of proteins in the range of 1-10 copies 
per cell, putting the sensitivity on a par with genomic 
approaches. ' 

These signal transduction analyses can be of additional 
value in experiments where inhibitors derived from a 
screening programme against the target are being evalu- 
ated for their potency and selectivity. The inhibitors caii 
encompass small molecules, antisense nucleic acid con- 
structs, dominant-negative proteins, or neutralizing anti- 
bodies microinjected into cells. In each case, proteome 
analysis can provide unique data in support of validation 
studies for a chosen candidate drug target. 

Proteomics and drug mode-of-action studies 

Once a validated target is committed to a screening regi- 
men to identify and advance a lead molecule, it is impor- 
tant to confirm that the efficacy of the inhibitor is through 
the expected mechanism. Such mode-of-action studies are 
usually tackled by various cell biological and biochemical 
methods. Proteomics can also be usefully applied to these 
studies and this is illustrated below by describing data ob- 
tained with OGT719. This is a novel galactosyl derivative of 
the cytotoxic agent 5-fluorouracil (5-FU), which is currently 
being developed by OGS for the treatment of hepatocel- 
lular carcinoma and colorectal metastases localized 
in the liver. The premise underpinning the design and ra- 
tionale of OGT719 was to derive a 5-FU prodrug capable 
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(a) OGT719t (b) 5-FUl (c) 5-FU/OGT71 9' t 




Figure 4. Features that are specifically up- or downregulated in Huh 7 cells by either S-fluorouracil C5-FU).or 
OGT719: (ci) elongation factor la2, (b) novel (three peptides by MS-MS) and (c) a-subunit of prolyl-4-hydroxylase: 
Arrows indicate up- or downregulated. 



of targeting, and being retained in, cells bearing the asialo- 
glycoprotein receptor (ASGP-r), including hepatocytes^^, 
hepatoma Huh7 cells^^ and some colorectal tumour cells^^ 
The growth of the human hepatoma cell line Huh7 is in- 
hibited by 5-FU or by OGT719. If the inhibition by 
OGT719 were the result of uptake and conversion to 5-FU 
as the active component, then it would be expected that 
Huh7 cells would show similar proteome profiles follow- 
ing exposure to either drug. 

To examine these possibilities, we conducted an experi- 
ment taking samples of Huh7 cells that had been treated 
with IC3Q doses of either OGT719 or 5-FU. Total cell lysates 
were prepared and taken through 2D electrophoresis, 
fluorescence staining, digital imaging and Proteograph 
analysis. To facilitate the interpretation of the data across 
all of the. 2291 features seen on the proteomes, drug- 
induced protein changes of fivefold or greater, identified 
by the Proteograph, were analysed further. Interestingly, 
from this analysis 19 identical proteins were changed five- 
fold or more by both drugs, strongly suggesting similarities 
in the mode of action for these two compounds. 

Thus, from very complex data involving >2000 protein 
features, using proteomics it is possible to analyse quanti- 
tatively and qualitatively each protein during its exposure 
to drugs. The biologist is now able to focus a series of fur- 
ther studies specifically on an enriched subset of proteins. 



Figure 4 shows highlighted examples of the selected areas 
of the proteome where some of these identified proteins in 
the above study are altered in response to either or both 
dnigs. 

Several of the proteins identified above as being modu- 
lated similarly by 5-FU or OGT719 in Huh7 cells were sub- 
jected to tandem mass-spectrometric analysis for anno- 
tation. Some of these, such as the nuclear ribosomal 
RNA-binding protein^^, can be placed into pyrimidine 
pathways or related cell cycle/growth biochemical path- 
ways in which 5-FU is known to act. 

To attribute further significance to the proteome mode- 
of-action studies with OGT719, another cell line, the rat 
sarcoma HSN, was used. Growth of these cells is inhibited 
by 5-FU, but they are completely refractory to OGT719; 
notably they lack the ASGP-r, which might explain this 
finding (unpublished). For our proteome studies, HSN 
cells were treated with 5-FU or OGT719 over a time course 
of one, two and four days. At each time point, cells were 
harvested and processed to derive proteomes and 
Proteographs. As before, we purposely focused on those 
proteins that increased or decreased by fivefold or more. 
In this instance, there were no proteins co-modulated by 
the two drugs. This is perhaps to be expected, given that 
the HSN cells are killed by 5-FU and yet are refractory to 
OGT719. 
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Clear potential 

The above is just an example of how proteomics can be 
used to address the mode of action of anticancer drugs. 
The potential of this approach is clear, and one can envis- 
age situations where it will be profitable to compare the 
proteomes of cells in which the drug target has been elimi- 
nated by molecular knockout techniques, or with small- 
molecule inhibitors believed to act specifically on the same 
target. In addition to using proteomics to examine the ac- 
tion of drugs, it is also possible to use this approach to 
gauge the extent of nonspecific effects that might eventu- 
ally lead to toxicity. For instance, in the example used 
above with HSN cells treated with OGT719, although cell 
growth was not affected, the levels of several specific pro- 
teins were changed. Further investigation of these proteins 
and the signalling pathways in which they are involved 
could be illuminating in predicting the likelihood or other- 
wise of long-term toxicity. 

Us of proteomics in formal drug 
t xicology studies 

A drug discovery programme at the stage where leads 
have been identified and mode-of-action studies are ad- 
vanced, will proceed to investigate the pharmacokinetic 
and toxicology profile of those agents. These two param- 
eters are of major: importance in the drug discovery 
process, and many agents that have looked highly promis- 
ing from in vitro studies have subsequently failed because 
of insurmountable pharmacokinetic and/or toxicity prob- 
lems in vivo. Whereas the pharmacokinetic properties of a 
molecule can now be characterized quickly and accu- 
rately, toxicity studies are typically much longer and more 
demanding in their interpretation. 

The ability to achieve fast and accurate predictions of 
toxicity within an in vivo setting would represent a big 
step forward in accelerating any drug discovery pro- 
gramme. Toxicity from a drug can be manifested in any 
organ. However, because the liver and kidney are the 
major sites in the body responsible for metabolism and 
elimination of most drugs, it is informative to examine 
these particular organs in detail to provide early indi- 
cations about events that might result in toxicity. 

The basis for most xenobiotic metabolizing activity is to 
increase the hydrophilicity of the compound and so facili- 
tate its removal from the body. Most drugs are metabo- 
lized in the liver via the cytochrome P450 family of en- 
zymes, which are known to comprise a total of -200 
different members^^-^"*, encompassing a wide array of 
overlapping specificities for different substrates. In addi- 
tion to clearance, they also play a major role in metabo- 



lism that can lead to the production and removal of toxic 
species, and in some instances it is possible to correlate 
the ability or failure to remove such a toxin with a specific 
P450 or subgroup. 

Unique P450 profiles 

Each individual person will have a slightly different P450 
profile, largely from polymorphisms and changes in ex- 
pression levels, although other genetic and environmental 
factors aside from P450 also need to be taken into consid- 
eration. A significant amount of research is currently 
being directed towards this field - known as pharmacoge- 
nomics - with the aim of predicting how a patient will re- 
spond to a drug, as determined by their genetic make- 
up35-37 jYy^ marked variation of individuals in their ability 
to clear a compound can be one of the key factors in de- 
ciding the overall pharmacokinetic profile of a drug. Not 
only will this have a bearing on the likelihood of a patient 
responding to a treatment, but it will also be a factor in 
determining the possibility of their experiencing an ad- 
verse effect. 

Many pharmaceutical companies are already employing 
genomic approaches, involving P450 measurements, as a 
key step in their assessment of the toxicological profile of 
a candidate drug and therefore of its suitability, or other- 
wise, to be considered for human clinical trials. There are 
limits to this approach, however. .Whereas ;the P450 mRNA 
profiling can predict with some accuracy the likely meta- 
bolic fate of a drug, it will riot, provide -information on 
whether the metabolites would subsequently lead to tox- 
icity. Besides the patient-to-patient differences in steady- 
state levels of the P450s, there are also characteristic induc- 
tion responses of these enzymes to some drugs. Moreover, 
as there can be some doubt over the correlation of mRNA 
levels and the corresponding protein levels, there is scope 
for misinterpretation of the results and hence real advan- 
tages to be gained from a proteome approach. In both in- 
stances, the ability to examine entire proteome profiles, in- 
cluding the P450 proteins, will be a significant advantage 
in understanding and predicting the metabolism and 
toxicological outcome of drugs. 

In addition to direct organ and tissue studies, the serum, 
which collects the majority of toxicity markers released 
from susceptible organs and tissues throughout the entire 
body, can be utilized. Serum is rich in nuclease activity 
and, as pharmacogenomics is not suited to deal with these 
samples, valuable markers of toxicity could go undetected. 
However, by using proteomics for these types of analyses, 
serum markers (and clusters thereoO are now accessible 
for evaluation as indicators of toxicity. 



DDT Vol. 4. No. 2 February 1999 



61 



research focus 



Pharmacoproteomics 

Proteomics can thus be used to add a new sphere of 
analysis to the study of toxicity at the protein level, and in 
the era of '-omics* there is a case to be made to adopt the 
term 'Pharmacoproteomics'^"'. Animals can be dosed with 
increasing levels of an experimental drug over time, and 
serum samples can be drawn for consecutive proteome 
analyses. Using this procedure, it should be possible to 
identify individual markers, or clusters thereof, that are 
dose related and correlate with the eniergence and severity 
of toxicity. Markers might appear in the serum at a defined 
drug dose and time that are predictive of early toxicity 
within certain organs and if allowed to continue will have 
damaging consequences. These serum markers could sub- 
sequently be used to predict the response of each individ- 
ual and allow tailoring of therapy whereby optimal effi- 
cacy is achieved without adverse side effects being 
apparent. This application can obviously extend to track- 
ing toxicity of drugs in clinical trials where serum can be 
readily drawn and analysed. Surrogate markers for drug ef- 
ficacy could also be detected by this procedure and could 
facilitate the challenge of identifying patient classes who 
will respond favourably to a drug and at what dosage. 

C nclusions 

By contrast to the agents administered to patients in clini- 
cal wards, the process of drug discovery is not a prescrip- 
tive series of steps. The risks are high and there are long 
timelines to be endured before it is known whether a can- 
didate drug will succeed or fail. At each step of the drug 
discovery process there is often scope for flexibility in in- 
terpretation, which over many steps is cumulative. The 
pharmaceutical companies most likely to succeed in this 
environment are those that are able to make informed 
accurate decisions within an accelerated process. 

The genomics revolution has impacted very positively 
upon these issues and now has a powerful new partner in 
proteomics. The ability to undertake global analysis of pro- 
teins from a very wide diversity of biological systems and 
to interrogate these in a high-throughput, systematic man- 
ner will add a significant new dimension to drug discov- 
ery. Each step of the process from target discovery to clini- 
cal trials is accessible to proteomics, often providing 
unique sets of data. Using the combination of genomics 
and proteomics, scientists can now see every dimension of 
their biological focus, from genes, mRNA, proteins and 
their subcellular localization. This will greatly assist our 
understanding of the fundamental mechanistic basis of 
human disease and allow new improved and speedier 
drug discovery strategies to be implemented. 



REFERENCES 

1 Crooke, S.T. (1998) Nat. Biotechnol. l6, 29-30 

2 Dykes, C.W. (1996) Br J. Clin. Pharmacol. 42, 683-695 

3 Schena, M. et al. (1998) Trends Biotechnol. l6, 301-306 

4 Ramsay, G. (1998) Nat. Biotechnol. l6, 40-44 

5 Anderson, N.L. and Anderson, N.G. (1998) Electrophoresis 19, 
1853-1861 

6 James, P. (1997) Biochem. Biophys. Res. Cotntnun. 231, 1-6 

7 Wilkins, M.R. et al. (1996) Biotechnol. Genet. Eng. Rev. 
13, 19-50 

8 Parekh, R.B. and Rohlff, C. (1997) Curr. Opin. Biotechnol. 8, 
718-723 

9 Figeys, D. et al (1998) Electrophoresis 19, 1811-1818 

10 * Wimmer, K. et al. (1996) Electrophoresis 17, 1741-1751 

11 Giometti, C.S., WUliams, K. and Tollaksen, S.L. (1997) 
Electrophoresis 18, 573-581 

12 Williams, K. et al. (1998) Electrophoresis 19, 333-343 

13 Rasmussen, R.K. et al. (1998) Electrophoresis 19, 818-825 

14 Hirano, T. et al. (1995) Br. J. Cancer 72, 840-^8 

15 Ji, H. et al. (1997) Electrophoresis \S, 605-613 

16 Ostergaard, M. et al. (1997) Cancer Res. 57, 4111-4117 

17 Paiel, V.B. et al. (1997) Electrophoresis 18, 2788-2794 

18 Arnott, D. et al. (1998) Anal. Biochem. 258, 1-18 

19 Anderson, L. and Seilhamer, J. (1997) Electrophoresis 18, 
533-537 

20 Rastan, S. and Beeley, L.J. (1997) Curr. Opin. Genet: Dev. 7, 
777-783 ' ' . 

21 Gravel, P. et al. (1995) Electrophoresis l6, 1152-1159 

22 Qian, Y. etal. (1997) Clin. Chem. 43, 352-359 

23 Sanchez, J.C. etal. (1997) Electrophoresis 18, 638-^1 

24 Watts, A.D. et al. (1997) Electrophoresis 18, 1086-1091 

25 Asker, N. et al. (1995) Biochem./. 308, 873-880 

26 Ramsby, M.L., Makowski, G.S. and Khairallah, E.A. (1994) 
Electrophoresis 15, 265-277 

27 Huber, LA. (1995) FEBS Lett. 369, 122-125 

28 Corthals, G.L. et al (1997) Electrophoresis 18, 317-323 

29 Hubbard, A.L., Wall, D.A. and Ma, A. (1983) /• Cell Biol. 96, 
217-229 

30 Zeng, F.Y., Oka, J.A. and Weigel, P.H. (1996) Biochem. Biophys. 
Res. Commun. 218, 325-330 

31 Mu, J-Z. et al. (1994) Biochim. Biophys. Acta 1222, 483-491 

32 Ghoshal, K. and Jacob, S.T. (1997) Biochem. Pharmacol. 53, 
1569-1575 

33 Guengerich, P.P. and Parikh, A. (1997) Curr. Opin, Biotechnol. 8, 
623-628 

34 Rendic, S. and Di Carlo, FJ. (1997) Drug Metah. Rev. 29, 413-580 

35 Vermes, A., Guchelaar, HJ. and Koopmans, R.P. (1997) Cancer 
' Treat. Rev. 23, 321-339 

36 Housman, D. and Ledley, FD. (1998) Nat. Biotechnol. l6, 492^93 

37 Persidis, A. (1998) Nat. Biotechnol. l6, 209-210 



62 



DDT Vol. 4, No. 2 February 1999 



IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



DECLARATION OF TOD BEDILION, Ph.D» 
UNDER 37 C.F.R. § 1.132 

I, TOD BEDILION, Ph.D., declare and state as 

follows : 

1. In April, 1996^ I became the first employee of 
Synteni, Inc., where I served as Research Director until its 
acquisition by Incyte Ccrp'oration in early 1998. After 
Synteni's acquisition, I continued in the position of Director 
of Corporate Development at Incyte until May 11, 2001. I am 
currently the Director of 'Business Development at Genomic * 
Health, Inc., Redwood City, California and an occasional 
Consultant to Incyte. 

2. Synteni was founded to commercialize expression 
microarrays, microarrays in which expressed nucleic acids — 
full-length cDNAs, fragments, of full-length cDNAs, expressed 
sequence tags (ESTs) — are arrayed on a common support to 
permit highly parallel detection and measurement of the 
expression of their cognate genes in a biological sample. 

3. During my employ at Synteni, virtually all (if 
not all) of my work efforts were directed to the further 
technical development and the commercial exploitation of that 
microarray technology; given the small size of our shop, most 
of us had' both technical and commercial responsibilities. The 
customer accounts for which I was personally responsible 
included large pharmaceutical companies, such as SmithKline 



Beecha., large biotechnology companies, such as Genentech, a 
small research institutes, such as DNAX Inc. , 

4. From my very first interaction with our 
cus.o.er3 consistently through to Sy^teni-s acquisition by 
incyte, I heard uniform, consistent, and emphatic requests 

Chat more genes be added to the arravs Ths* 

i-ne errays. This was true with 
respect to both our oric±-n^-\ r^^^ 

r original microarrays, based on customer- 
proviaed genes and libraries =,n^ , 

iioraries, and our later, "generic", oene 

expression microarrays, based upon the unigene clone ' 
collection (our so-called "UniGe." arrays). Pro. day l the 

pressure on us was to Drint- ^ttov^, ' 

print ever more spots on the array, it 
was never a cuestion- nnr- ^„ *. j'. j.t 

• customers wanted ever more genes, on 

the array, each new gene-specific probe providing 
incrementally more value, to the customer.^ 

S. As a commercial enterprise, providing value to 
our customers was our mainr «^ 

our ma^or concern. Thus, to increase the 
value of our produetc? anri • . 

products and services in the marketplace - to 
increase our ability to sen 

y to sell our microarrays and microarray 
services, their "salabilitv" *^ 

^ . . -Laoiiity our efforts from the very 

beginning were devoted to increasing the number of specific 
genes whose expression could be detected with our microarrays. 

6. Indeed, one of our major competitive advantages 
m the marketplace — nor . . vantages 

^"^^ regards other commercial 
suppliers, but also with respect to the innumerable 
laboratories and companies that were attempting to spot arrays 
^r^^^^e^r^o^^ facilities «as the nu^er of 

encoded gene product wal Jcnow^rbut were a%Jt°'T=^^ 'k""''='" 

and all expressed genes. " asking for probes specific to any 



distinct gene-specific probes that we provided on our 
expression microarrays. Our first 10,000 element UniGem array 
put the holy grail of gene expression analysis --.the human 

whole genome array - within sight for the very first time 
(With respect to timing of the UniGEM program we began project 

planning and technology develcpmen- in mid 1996 and delivered 

our first 10,000 element standard content human arrays in the 

first months of 1997 as I recall) . 

■7. By the end of 1997, our efforts to provide the 
most comprehensive, and thus most valuable, human gene 
expression microarrays had been sufficiently successful that 
Incyte agreed to acquire Synteni for a reported $80 million. 

8. I declare further that all statements made 
herein of my own knowledge axe true and that ail statements 
made on information and belief are believed to be true, and 
further that these statements were made with the knowledge 
that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under 
Section 1001 of Title 18 of the United States Code and may 
jeopardize the validity of any patent application in which 
this declaration is filed or any patent that issues thereon. 



Tod Bedilion, Ph.D. 




THE UNITED STATES PATENT AND TRADEMARK OFFICE 



DECLARATION OF VISHWANATH R. IYER. Ph D 
UNDER 37 C.F.R. § 



I, VISHWANATH R. IYER, Ph.D., declare and state as 

follows : 

1. I am an Assistant Professor in the Section of 
Molecular Genetics and Microbiology, Institute of Cellular and 
Molecular Biology, University of Texas at Austin, where my 
laboratory currently studies global transcriptional control in 
yeast, gene expression programs during human cell 
proliferation, and genome-wide transcription factor targets in 
yeast and human. Immediately prior to this position, I spent 
four years as a postdoctoral fellow in the laboratory of 
Patrick 0. Brown at Stanford University studying the 
transcriptional programs of yeast and of human cells. My 
curriculum vitae is attached hereto as Exhibit A. 

2. Beginning in Dr. Brown's laboratory, where I 
helped to develop the first whole genome arrays for yeast and 
early versions of highly representative cDNA arrays for human 
cells, and continuing to the present day, i have used 
microarray-based gene expression analysis as a principal ' 
approach in much of my research. 

3. Representative publications describing this 
work include: 



DeRisi J. et al.. "Exploring the metabolic 

ItT"": c°"'"°' °' ^"^^ expression on a g^^omc 
scale,' Science 278:680-686 (1997)/ genomic 

^i" ^^^9et validation and 

identafication of secondary drug target efLcts 
using^DNA microarrays, « Nature Med. 4:1293 1301 

Science 283:83-87 (1999);^ and 

Nature Oenetics 24: 227-235 (2000). < 
TWO of the papers describe our use of micro«ray-based 
expression profiling to explore the .etabolio reprograMning 
that occurs during .ajor environ:„ental changes, both in yeast 
(DeRisi et al.. during the shift from fex^entation to 
respiration) and in hun,an cells (lyer et al., hu^an 
fibroblasts exposed to seru:„) . One reference describes our 
use Of expression profile analysis in drug target validation 
and identification of secondary drug effects (Marton et al ) 
tad one describes our use of expression profiling as a 
molecular phenotyping tool to discriminate among human cancer 
cells (Ross et al.) . 

4. Whether used to elucidate basic physiological 
responses, to study primary and secondary drug effects, or to 
discriminate and classify human cancers, expression profiling 
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Attached hereto as Exhibit B. 
Attached hereto as Exhibit C. 
Attached hereto as Exhibit D. 
Attached hereto as Exhibit E. 



as we have practiced it relies f o,- • . 

use the or!' " ^-°n=trated that „e can 

use the presence or absence of a characteristic drug 

signature- pattern of altered 
cens to e^lore the .ecw: 7:::^! '^'----^ 

secondary effects that can si.na. po en a n^eTt^- ''^""'^ 
side effects. As anoth.. . ^entially deleterious drug 

. ^^^^ demonstrated that 

gene expression patterns can be used to classifv h 
cell lines, while it ^ ^-^assify human tumor 

wniie It IS of course advantageous to kr,ow 
Mologica: function of the encoded gene productV n orL L 
reach a better understanding of the celiular .echanisL 
underlying these result.? ^>,^ ^=*«isms 

retire Knowledge o h" b 1 '° 
proteins. °' "'^ -coded 



proteins . 

6 



The resolution of the patterns used in such 
comparisons is determined by the nun^er of genes de 1 . . 
greater the nu^er of genes detected, the highe. ^ 
resolution of the pattern T^ ^ 

patcern. it goes without sayino th^t >,^^v, 
resolution patterns are generally .ore .seful L L . 
comparisons than lower resolution patterns. With such hi h 
resolutions comes a correspondingly highe. degree " 
statistical confidence for distinguishing different patte 
as well as identifying Similar ones. Patterns, 

7. Each gene included as a probe on a microarrav 
provides a signal that is specific to th. '"^--oarray 
at least to « ^ • Pecitic to the cognate transcript, 

at least to a first approximation ' Each n^u, 

^ii. t,acn new gene-specific 

In a more nuanced view ir i= ^ 

(Continued...) 



probe added to a microarray thus increases the „™ber of genes 
detectable by the device, increasing the resolving po„er of 
the device. As I note above, higher resolution patterns are 
generally more useful in comparisons than lower resolution 
patterns. Accordingly, each new gene probe added to a 
microarray increases the usefulness of the device in gene 
expression profiling analyses. This proposition is so well- 
established as to be virtually an axiom in the art, and has 
been as long as I have been working in the field, and 
certainly since the time I enhanced on the production of whole 
genome arrays In early 1996. simply put, arrays with fewer 
gene-specific probes are inferior to arrays with more gene- 
specific probes. 

8. For example, our ability to subdivide cancers 
.nto discriminable classes by expression profiling is limited 
by the resolution of the patterns produced, with more genes 
contributing to the expression patterns, we can potentially 
draw finer distinctions among the patterns, thus subdividing 
otherwise indistinguishable cancers' into a greater number of 
Classes; the greater the number of classes, the greater the 
likelihood that the cancers classified together will respond 
similarly to therapeutic intervention, permitting better 
individualization of therapy and, we hope, better treatment 
outcomes . 

9. If a gene does not change expression in an 
experiment, or if a gene is not expressed and produces no 

(-Continued) 

without discriminating amona them anrt f:^>- 

of a variety of allelic variant^f a sfnoi ""^^^ '° presence 
discriminating among them without 



signal in an experiment, that is not to say that the probe 
lacks usefulness on the array; it only means that an 
insufficient number of conditions have been sampled to 
identify expression changes, m fact, an experiment showing 
that a gene is not expressed or that its expression level does 
not change can be e^ally informative. To provide maximum 
versatility as a research tool, the microarray should 
include - and as a biologist I would want my microarray to 
include - each newly identified gene as a probe. 

10. I declare further that all statements made 
herein of my own knowledge are true and that all statements 
made on information and belief are believed to be true, and 
further that these statements were made with the knowledge 
that willful false statements and the like so made are 
punishable by fine or imprisonment, or both, under 

Section 1001 of Title 1R «->,^ t, • ^ ^ 

11 tie 18 of the United States Code and may 

jeopardize the validity of any patent application in which 
this declaration is filed or any patent that issues thereon. 



October 20, 2003 
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EXHIBIT A 

Docket No.: PC-0044 CIP 
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Vishwanath R. Iyer 

Assistant Professor 

Section of Molecular Genetics and Microbiology 

Institute of Cellular and Molecular Biology 

MBB3.212A, University ofTexas at Austin 

Austin, TX 78712-0159 

Phone: 512-232-7833 

Fax: 512-232-3432 

Email: vishy@mail.utexas.edu 

Education/Training 

Bombay University Mumbai, India B.Sc. (1987), Chemistiy & Biochemistry 

M. S. University of Baroda, Baroda, India M.Sc. (1989), Biotechnology 

Harvard University, Cambridge MA Ph.D. (1996), Genetics 

Stanford University, Stanford CA Post-doctoral (1996-2000), Genomics 

Research Experience 

9/00-5/ 03 Assistant professor. Section of Molecular Genetics and 
Microbiology, University ofTexas, Austin TX 

■ Global transcriptional control in yeast 

■ Gene expression programs during human cell proliferation 

■ Genome-wide transcription factor targets in yeast and human 

■ Collaborative microarray facility 

5/96-8/00 Post-doctoral fellow Stanford University, Stanford CA 
(Advisor: Dr. Patrick O. Brown) 

■ Yeast whole-genome ORF and intergenic microarrays 

■ Human cDNA microarrays for expression profiling 

9/89-4/96 Graduate student Harvard University, Cambridge MA 

(Advisor: Dr. Kevin Struhl) 

■ Yeast transcriptional regulation 

Honours and Awards 

Government of India Biotechnology Fellowship (1987-1989) 
University Grants Commission Junior Research Fellowship (1989) 
Stanford University/NHGRI Genome Training Grant (1996) 

Invited Conference talks (selected) 

Invited Lecturer, NEC-Princeton Lectures in Biophysics 

Princeton, NJ (June 1998) 
Plenaiy Session Speaker, HGM '99 (HUGO Human Genome Meeting) 

Brisbane, Australia (April 1999) 
Invited Speaker, Gordon Research Conference "Human Molecular Genetics" 

Newport, RI (August 2001) 



Invited Speaker, Nature Genetics "Oncogenomics 2002" Conference 
Dublin, Ireland (May 2002) 

^"''Rpl&VIX'i'"^' T Approaches to Transcriptional 

Regulation Cold Spnng Harbor Laboratoiy Meeting (March 2003) 

Symposium ^Chair and Speaker "Functional Genomi^" American Society for 
Biochemistiy and Mo ecu ar Biology Meeting, San Diego, CA (April 20% 

Invited Speaker in Functional Genomics (Gene Networks) S^mposiSm International 

Congress of Genetics, Melbourne Australia July 6-11 2003 
Invited Speaker "BioArrays Europe 2003" 
Cambridge, UK (Sep/Oct 2003) 

Departmental Seminars 

^Tctobe^f ^too2 Biochemistiy & Biophysics Departments. 

N^em" ^^P^^"^^"^ of Biochemistiy, 

UT Southwestern Medical Center, Human Genetics Seminar Series 
May 5 2002 ^wjca, 

UCLA School of Medicine, Department of Human Genetics 
June 2 2003 

National Human Genome Research Institute 
June 12 2003 

Sanger Institute of the Wellcome Trust, Hinxton UK 
Sep 2003 ' 

Other Professional Activities 

^To^^T ^"""""^ ^^"''"'^ Research, Nature Genetics, Science (1998- 

^"S?o ' 2?03f ^^^^ "^"^^"S and using DNA Microarrays" 

Member, NIDDK Special Emphasis Review Panel ZDKi (2001-2002) 

Publications 

1. telV & Stiiihl, K. (1995) Poly(dA:dT), a ubiquitous promoter element that 
stimulates transcnption via its intrinsic DNA structure, EMBO J. 14: 2570-2579. 

^' ^ ^ ^^""^^ ^ "^^^ '^^^^^ transcription initiation rates in 

Saccharomyces cerevisme. Proc. Natl. Acad. Sci . (USA) 93:5208-5212 



4. DeRisi J. L lm2LZ. & Brown P. 0. (1997) Exploring the metabolic and genetic 
control ofgene expression on a genomic scale. Science 278:680-686 

5. MartonM. J.,DeRisiJ.L,BennettH.A.,IverV^.MeverM R RoWcP t 

? O ^ FH.nd '."n'r' ?!^'^ t^^'lt H^^e 1 L H.f Brown 
P. 0 &Fnend S^H. (1998) Drug target validation and identification of secondaiy 
drug target effects using DNA microarrays. Nature Med. 4:1293-1301 

6. Lutfiy)^L L JverV R, DeRisi J., DeVit M. J., Brown P. O. & Johnston M (1998) 
Characterization of three related glucose repressors and genes they re^te in 
Saccharomyces ceremsiae. Genetics 150:1377-1391 cguiate in 

7. Spellman P. T Sherlock G., Zhang M. Q., IverV. R.. Anders K, Eisen M B Brown P 

0., Botetem D. & Futcher B. (1998) Comprehensive identification of cell c^de 

SS?S/":3273%^^^^^^ 

8. IjblV^, Eisen M.B., Ross D.T.,SchulerG., Moore T LeeJ C F Tr»nt i m 
Staudt L. M. Hudson Jr. J., Bogusid M. S., D., Sin D B^^ein D & 

^'^^'^ T^e.transcriptional program in the response of hSSS 
fibroblasts to serum. Science 283:83-87 "uindn 

^' nl^ii ^' ^ ^^'^^ ^'^^^^ Genomics and array technology. Curr. Opin. Oncol 



10. Ross D T Scherf U Eisen M. B., Perou C. M., Spellman P., IverV. R, Rees C 
Jeffrey S. S., Van de Rijn M., Waltham M.. Pergamenschiko;Xi:;i7'c F 
Lashkan D., Shalon D., Myers T. G., Weinstein J. N., Botstein D & Brown P O 
^'a^e ™ " ^^"^ — ~ human^rcrc^i?^^^ 

11. Sudarsanam P Brown P. 0. & Winston F. (2000) Whole-genome 

12. Tran H. G., Steger D J., lyerV. R., & Johnson A. D. (2000) The chromo domain 
SKI^s"^ ''''' ATP-depend^ent chiomatrdfcor 

13. Gross C, Kelleher M lijgiLVL^ Brown P. 0., & Winge D. R.. (2000) Identification 
of the copper regulon in Saccharomyces cerevisiae by DNA m croarrays. j SS 
Chem. 275: 32310-32316 ^ 

14. Reid J. L Iver V.R. , Brown P. 0. & Struhl K. (2000) Coordinate regulation of yeast 
nbosomal protein genes is associated with targeted recruitment of Esai histone 
acetylase. Mol. Cell 6: 1297-1307 "'^lyiie 



15. teliLE. Horak C, Scafe C. S., Botstein D., Snyder M. & Brown P. O (2001) 
^a"^T40^?Sr """^^^ transcription factors SBF anZfiF 

16. Mild R. I^dota K, Bono H. Mizuno Y., Tomaru Y., Caminci P., Itoh M Shibata K. 
Kawa, J., Konno H Watanabe S., Sato K., Tokusumi Y., Kikuch N., Ishii Y ' 
Hama^chi Y., Nishizuka I., Goto H.. Nitanda H., Satomi S., Yoshild A Ku^'akabe 
M., DeRisi J.L, Eisen M,B, IverV.R, Brown P 0 Muramatsu M ^i^m.^ u 
O^ki Y. & Hayashizaki Y.-toeline^nTdevd^^^^ 

pathways m vivo by expression profiling using the RIKEN set of 18 sTISinLncrth 
ennched mouse cDNA arrays Proc. Nafl Acad. Set 98: 2199 2^^^^ 

'c^rtsi'su^sfr' '"^y^^^ ~ 

18. lyerV.R. Microarray-based detection of DNA protein interactions: Chromatin 
Immunoprecipitation on Microarrays, in DNA Microarrays: A Moleall^Clonina 
PrZ^ootl ^53-463 (Cold Ipring Hi^^or Lbo^io'ry 
*{notpeer reviewed) 

19. Killion, P., Sherlock G. and JyerV^ (2003) The Longhom Array Database an 
open-source implementation of the Stanford Microarfay Database BMC 
Bioinfojinatics 4: S2 ^^-iJiriy, 

20. Hahn J. S., Hu Z., Thiele D. J. & IverV.R. Genome-Wide AnR\v^i^ nfth. r;«i ^ 
Stress Responses Through Heat Sh^nscripSc^ofc:^^^^^^^^^ 

21. ^m J. & lyetm The global role of TBP recruitment to promoters in mediating 
gene expression profiles (manuscript in preparation) mecianng 



Current/Pending Research Support 

Uoi AA13518-01 Adron Harris (PI) 25% effort 

9/28/01 - 9/27/06 

NIH/NIAAA 

"INIA; Microarray Core" 

mium^/oor^'' Neuroscience Initiative on Alcoholism 

inr^^^t ■ ^ 8°°' *° ^"PP°" *e use of microarray technoloEv 

1 "or"" ""^™ P^^"^'" °' ="°"P-y exciive aSS^ 

Role: Co-investigator 



003658-0223-2001 Iyer (PI) 16% effort 
01/01/02-08/31/04 

Texas Higher Education Cbordinating Board (ARP) 

^KToarray based global mapping of DNA-protein interactions at promote:^ in human 

^omotere °' *° """^ "^''^ interactions of transcription factors with human 

Role: PI 



Information Technology Research 0325116 R. Mooney (PI) 9% effort 
09/01/03-08/31/07 jy 
NSF 

rMscovety^'^'^"" Multi-Source Data Mining to Experimentation for Gene Network 
Role: Co-investigator 



1 Roi CA95548-01A2 (pending) Iyer (PI) 25% effort 

12/1/03-11/30/08 

NIH 

"Analysis of genome-wide transcriptional control in yeast" 

Sfui o^D^^.S:/'''^^ '^^'^^ f-'- '-sets in yeas, through 

Role: PI 

Breast Cancer Idea Award (pending) Iyer (PI) 10% effort 
1/1/04 - 12/31/06 

US Army Medical Research and Materiel Command 

"Genome-wide chromosomal targets of oncogenic transcription factors" 

This IS a project amied at identiiying direct chromosomal targets of c-myc and ER in 

human cells through the use of a novel sequence tag analysis method. 

003658-0531-2003 (pending) Marcotte (PI) 8% effort 
01/01/04 - 12/31/05 

Texas Higher Education Coordinating Board (ATP) 

genomiS^'''''''^^ high-throughput platform for measuring gene function on a 
TWs proposal is aimed at developing a novel microarray based platform for automated 
oJge;Son. ™^Sing of cells, allowing rapid and systematic evaluation 
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Seal USA 86. 263 (1991); B. Tsctiersch et aH. 
6MB0 Jl 13, 3822 (1994); M. T. MacfreOd efat. Ctf 
87. 75 (1 996); 0. a Stokes, K. D. Twtof , a P. Ptory. 
Pnx. NatL Acad Set USA 93, 7137 (1996). 

36. P. M. Palosaari et at, J. BioL Ch&n. 266, 10750 
^1991); A. Schmitz, K. H Gartemann. J. Fieder. L 
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4068 (1992): V. Sharma. K. Sovama. a Mega- 
nattian, M. E. Hudspeth, j. BaaernL 174. 5057 
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Exploring the Metabolic and Genetic Control of 
Gene Expression on a Genomic Scale 

Joseph L DeRisi, Vishwanath R. Iyer, Patrick O. Brown* 

DNA microarrays containing virtually every gene of Saccharomyces cerevisiae were used 
to cairy out a comprehensive investigation of the temporal program of gene expression 
accompanying the metabolic shift from fermentation to respiration. The exprSSio^ 
profiles observed for genes with known metabolic functions pointed to features of the 
metabolic reprogramming that occur during the diauxic shift, and the expression pattems 
"^a^y previously uncharacterized genes provided clues to their possible functions The 
same DMA microarrays were also used to identify genes whose expression was affected 
by d letion of the transcnptional co-repressor TUP1 or overexpression of the transcrio- 
tional activator YAPl. These results demonstrate the feasibility and utility of this ao- 
proach t genomewide exploration of gene expression pattems. 



The complete sequences of nearly a dozen 
microbial genomes are known, and in the 
next several years we expect to know the 
complete genome sequences of several 
mecazoans. including the human genome. 
Defining the role of each gene in these 
genomes will be a formidable task, and un- 
derstanding how the genome functions as a 
whole in the complex natural history of a 
living organism presents an even greater 
challenge. 

Knowing when and where a gene is 
expressed often provides a strong clue as to 
its biological role. Conversely, the panem 
of genes expressed in a cell can provide 
detailed information about its state. Al- 
though regulation of protein abundance in 
a cell is by no means accomplished solely 
by regulation of mRNA, virtually all dif- 
ferences in cell type or state are correlated 
with changes in the mRNA levels of many 
genes. This is fortuitous because the only 
specific reagent required to measure the 
abundance of the mRNA for a specific 
gene is a cDNA sequence. DNA microar- 
rays, consisting of thousands of individual 
gene sequences printed in a high-density 
array on a glass microscope slide (J, 2), 
provide a practical and economical tool 
for studying gene expression on a very 
large scale (3-6). 

Saccharomyces ceransiae is an especially 

Depamnent of Biochemistry. Stanford University Stfwol 
oi Medicine. Howard Hughes Medical Institute. Stanford. 
CA 94305-5428. USA . 

'To whom correspondence shoiid be addressed. E-mail: 
plyown®cmgm.stantord.edu 



favorable organism in which to conduct a 
systematic investigation of gene expression. 
The genes are easy to recognize in the ge- 
nome sequence, ris regulatory elements are 
generally compact and close to the tran- 
scription units, much is already known 
about its genetic regulatory mechanisms, 
and a powerful set of tools is available for its 
analysis. 

A recurring cycle in the natural history 
of yeast involves a shift from anaerobic 
(fermentation) to aerobic (respiration) me- 
tabolism. Inoculation of yeast into a medi- 
urn rich in sugar is followed by rapid growth 
fueled by fermentation, with the production 
of ethanol. When the fermentable sugar is 
exhausted, the yeast cells turn to ethanol as 
a carbon source for aerobic growth. This 
switch from anaerobic growth to aerobic 
respiration upon depletion of glucose, re- 
feaed to as the diauxic shift, is correlated 
with widespread changes in the expression 
of genes involved in fundamental cellular 
processes such as carbon metabolism, pro- 
tein synthesis, and carbohydrate storage 
(7). We used DNA microarrays to charac- 
terize the changes in gene expression that 
take place during this process for nearly the 
entire genome, and to investigate the ge- 
netic circuitry that regulates and executes 
this program. 

Yeast open reading frames (ORFs) were 
amplified by the polymerase chain reaction 
(PCR), with a commercially available set of 
primer pairs (8). DNA microarrays, con- 
taining approximately 6400 distinct DNA 
sequences, were printed onto glass slides by 
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using a simple robotic printing device (9). 
C^lls from an exponentially growing culture 
of yeast were inoculated into fresh medium 
and grown at 30°C for 21 hours. After an 
initial 9 hours of growth, samples were har- 
vested at seven successive 2-hour intervals, 
and mRNA was isolated (10). Fluorescently 
labeled cDNA was prepared by reverse tran- 
scription in the presence of Cy3(grecn)- 
or Cy5(red)-labelcd deoxyuridine triphos- 
phate (dUTP) (J J) and then hybridized to 
the microarrays (J 2). To maximize the re- 
liability with which changes in expression 
levels could be discerned, we labeled cDNA 
prepared from cells at each successive time 
point with Cy5. then mixed it with a Cy3- 
labeled "reference" cDNA sample prepared 
from cells harvested at the first interval 
after inoculation. In this experimental de- 
sign, the relative fluorescence intensity 
measured for the Cy3 and Cy5 fluois at 
each array element provides a reliable mea- 
sure of the relative abundance of the corre- 
sponding mRNA in the two cell popula- 
tions (Fig. 1). Data from the series of seven 
samples (Fig. 2). consisting of more than 
43.000 expression-ratio measurements, 
were organized into a database to facilitate 
efficient exploration and analysis of the 
results. This database is publicly available 
on the Internet (13). 

During exponential growth in glucose- 
rich medium, the global pattern of gene 
expression was remarkably stable. Indeed, 
when gene expression pattems between the 
first two cell samples (harvested at a 2-hour 
interval) were compared, mRNA levels dif- 
fered by a factor of 2 or more for only 19 
genes (0.3%). and the largest of these dif- 
ferences was only 2.7-fold ( 14). However, as 
glucose was progressively depleted from the 
growth media during the course of the ex- 
periment, a marked change was seen in the 
global pattern of gene expression. mRNA 
levels for approximately 710 genes were 
induced by a factor of at least 2, and the 
mRNA levels for approximately 1030 genes 
declined by a factor of at least 2. Messenger 
RNA levels for 183 genes increased by a 
factor of at least 4, and mRNA levels for 
203 genes diminished by a factor of at least 
4. About half of these differentially ex- 
pressed genes have no currently recognized 
function and are not yet named. Indeed, 
more than 400 of the differentially ex- 
pressed genes have no apparent homology 
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t any gene whose function is known (J5). 
The responses of these previously unchar- 
acterized genes to the diauxic shift therefore 
provides the first small clue to their possible 
roles. 

The global view of changes in expres- 
sion f genes with known functions pro- 
vides a vivid picture of the way in which 
the cell adapts to a changing environ- 
ment. Figure 3 shows a ponion of the yeast 
metabolic pathways involved in carbon 
and energy metabolism. Mapping the 
changes we observed in the mRNAs en- 
coding each enzyme onto this framework 
allowed us to infer the redirection in the 
flow of metabolites through this system. 
We observed large inductions of the genes 
coding for the enzymes aldehyde dehydro- 
genase (ALD2) and acetyl-coenzyme 
A(CoA) synthase (ACS J), which func- 
tion together to convert the products of 
alcohol dehydrogenase into acetyl-CoA, 
which in turn is used to fuel the tricarbox- 
ylic acid (TCA) cycle and the glyoxylate 
cycle. The concomitant shutdown of tran- 
scription of the genes encoding pyruvate 
decarboxylase and induction of pyruvate 
carboxylase rechannels pyruvate away 
from acetaldehyde, and instead to oxalac- 
etate. where it can serve to supply the 
TCA cycle and gluconeogenesis. Induc- 
tion of the pivotal genes PCKl, encoding 
phosphoenolpyruvate carboxy kinase, and 
FBPl, encoding fructose 1,6-biphos- 
phatase, switches the directions of two key 
irreversible steps in glycolysis, reversing 
the flow of metabolites along the revers- 
ible steps of the glycolytic pathway toward 
the essential biosynthetic precursor, glu- 
cose-6-phosphatc. Induction of the genes 
coding for the trehalose synthase and gly- 
cogen synthase complexes promotes chan- 
neling of glucose-6-phosphate into these 
carbohydrate storage pathways. 

Just as the changes in expression of 
genes encoding pivotal enzymes can pro- 
vide insight into metabolic reprogram- 
ming, the behavior of large groups of func- 
tionally related genes can provide a broad 
view of the systematic way in which the 
yeast cell adapts to a changing environ- 
ment (Fig. 4). Several classes of genes, 
such as cytochrome c-related genes and 
those involved in the TCA/glyoxylate cy- 
cle and carbohydrate storage, were coord i- 
nately induced by glucose exhaustion. In 
contrast, genes devoted to protein synthe- 
sis, including ribosomal proteins, tRNA 
synthetases, and translation, elongation, 
and initiation factors, exhibited a coordi- 
nated decrease in expression. More than 
95% of ribosomal genes showed at least 
twofold decreases in expression during the 
diauxic shift (Fig. 4) (13). A noteworthy 
and illuminating exception was that the 



genes encoding mitochondrial ribosomal 
genes were generally induced rather than 
repressed after glucose limitation, high- 
lighting the requirement for mitchondrial 
biogenesis (/3). As more is learned about 
the functions of every gene in the yeast 
genome, the ability to gain insight into a 
cell's response to a changing environment 
through its global gene expression patterns 
will become increasingly powerful. 

Several distinct temporal patterns of ex- 
pression could be recognized, and sets of 
genes could be grouped on the basis of the 
similarities in their expression pattenw. The 
characterized members of each of these 
groups also shared important similarities in 
their functions. Moreover, in most cases, 
common regulatory mechanisms could be 
inferred for sets of genes with similar expres- 
sion profiles. For example, seven genes 
showed a late induction profile, with mRNA 
levels increasing by more dian ninefold at 
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the last timepoint but less than threefold at 
the preceding timepoint (Rg. 5B). All of 
these genes were known to be glucose-re- 
pressed, and five of the seven woe previously 
noted to share a common iqjscream activat- 
ing sequence (UAS). the carbon source re- 
sponse element (CSRE) (16-20), A search 
in the promoter regions of the remaining two 
genes. ACRi and IDP2, revealed that 
ACRJ. a gene essential for ACS! activity, 
also possessed a consensus CSRE motif, but 
interestingly, IDP2 did not. A search of the 
entire yeast genome sequence for the con- 
sensus CSRE motif revealed only four addi- 
tional candidate genes, none of which 
showed a similar induction. 

Examples horn additional groups of 
genes that shared expression profiles are 
illustrated in Fig. 5. C through F. The 
sequences upstream of the named genes in 
J^ig- 5C all contain stress response ele- 
ments (STRE). and with the exception 




I genome microarray. The actual size of the microarray rs 18 mm by 18 mm The 
microarray was printed as described (9). This image was obtained with the s^e fiuor^cent 
sinning confocal microscope used to collect all the data we report (45). A fIuores(^nrS 

tr^^^o7L3 dl^p'^'^T H °^ ^'^'^ ^^^^^ transcription in the 

presence of Cy3-dUTP. Similariy. a second probe was prepared from mRNA isolated from cells taken 

!To n%^?l^ ^""^"'^ ^'^ hours later (curture density of ^2 x iQe cells/ml. with a glucose level of 
r . i ™' ^1 transcription in the presence of CyS-dUTP. In this image, hybridizatbn of tte 

signal, and hybridization of Cy5-dUTP-labeled cDNA (that is. mRNA expression at 9 5 hoiSTte 
represented as a red signal. Thus, genes induced or repressed after the diauxic shift appear in thi^ 
'Z^1^^'^ J2 respectively. Genes expressed at roughly equal levels befarVand after 

the diauxic shift appear in this image as yellow spots. 
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of HSP42, have previously been sh wn to 
be controlled at least in part by these 
elements (21-24). Inspection of the se- 
quences upstream f HSP42 and the two 
uncharactertzed genes shown in Fig. 5C, 
YKL026c, a hypothetical protein with 
similarity to glutathione peroxidase, and 
YGR043c, a putative trarualdolase, re- 
vealed that each of these genes abo pos- 
sess repeated upstream copies of the stress- 
responsive CCCCT motif. Of the 13 ad- 
ditional genes in the yeast genome that 
shared this expression profile including 
HSP30, ALD2, OM45, and 10 uncharac- 
terized ORFs (25)], nine contained one or 
more recognizable STRE sites in their up- 
stream regions. 

The heterotrimeric transcriptional acti- 
vator complex HAP2,3,4 has been shown 
to be responsible for induction of several 
genes important for respiration (26-28), 
This complex binds a degenerate consensus 
sequence known as the CCAAT box (26). 
Computer analysis, using the consensus se- 
quence TNRYTGGB (29), has suggested 
that a large number of genes involved in 
respiration may be specific targets of 
HAP2,3A (30). Indeed, a putative 
HAP2j3t4 binding site could be found in 
the sequences upstream of each of the seven 
cytochrome c-rclated genes that ishowed 
the greatest magnitude of induction (Fig. 
5D). Of 12 additional cytochrome c-related 
genes that were induced, HAP2,3,4 binding 
sites were present in all but one. Signifi- 
cantly, we found that transcription of 
HAP4 itself was induced nearly ninefold 
concomitant with the diauxic shift. 

Control of ribosomal protein biogenesis 
is mainly ■ exerted at the transcriptional 
level, through the presence of a common 
upstream-activating element (UAS„) 
that is recognized by the Rapl DNA-bind- 
ing protein (31 , 32). The expression pro- 
files of seven ribosomal proteins are shown 
in Fig. 5F. A search of the sequences 
upstream of all seven genes revealed con- 
sensus Rapl -binding motifs (33). It has 
been suggested that declining Rapl levels 
in the cell during starvation may be re- 
sponsible for the decline in ribosomal pro- 
tein gene expression (34). Indeed, we ob- 
served that the abundance of RAP I 
mRNA diminished by 4.4-fold, at about 
the time of glucose exhaustion. 

Of the 149 genes that encode known or 
putative transaiption factors, only two, 
HAP4 and SlP4i were induced by a factor of 
more than threefold at the diauxic shift. 
SIP4 encodes a DNA-binding transcrip- 
tional activator that has been shown to 
interact with Snfl, the "master regulator" of 
glucose repression (35). The eightfold in- 
duction of SIP4 upon depletion of glucose 
strongly suggests a role in the induction of 



downstream genes at the diauxic shift. 

Although most of the transcriptional 
responses that we observed were not pre- 
viously known, the responses of many 
genes during the diauxic shift have been 
described. Comparison of the results we 
obtained by DNA microanay hybridiza- 
tion with previously reported results there- 
fore provided a strong test of the seruitiv- 
ity and accuracy of this approach. The 
expression patterns we observed for previ- 
ously characterized genes showed almost 
perfect concordance with previously pub- 
lished results (36). Moreover, the differ- 
ential expression measurements obtained 
by DNA microarray hybridization were re- 
producible in duplicate experiments. For 
example, the remarkable changes in gene 
expression between cells harvested imme- 
diately after inoculation and immediately 
after the diauxic shift (the first and sixth 
intervals in this time series) were mea- 
sured in duplicate, independent DNA mi- 
croarray hybridizations. The conelation 
coefficient for two complete sets of expres- 
sion ratio measurements was 0.87, and for 
more than 95% of the genes, the expres- 



si n ratios measured in these duplicate 
experiments differed by less than a fector 
of 2. However, in a few cases, ther were 
discrepancies between our results and pre- 
vious results, pointing i technical limiu- 
tions that will need to be addressed as 
DNA microarray technology advances 
(37 , 38). Despite the noted exceptioru, 
the high concordance between the results 
we obtained in these experiments and 
those of previous studies provides confi- 
dence in the reliability and thoroughness 
of the survey. 

The changes in gene expression during 
this diauxic shift are complex and involve 
integration of many kinds of information 
about the nutritional and metabolic state 
of the cell. The large number of genes 
whose expression is altered and the diver- 
sity of temporal expression profiles ob- 
served in thU experiment highlight the 
challenge of understanding the underlying 
regulatory mechanisms. Ont approach to 
defining the contributions of individual 
regulatory genes to a complex program of 
this kind is to use DNA microarrays to 
identify genes whose expression is affected 



Fig. 2. The section of the ar- 
ray rrxjicated by the gray box 
in Fig. 1 is shown for each of 
the experiments described 
here. Representative genes 
are tabeted. In each of the ar- 
rays used to analyze ger>e 
expression durir>g the diauxic 
shift, red spots represent 
genes that were induced rel- 
ative to the initial timepoint. 
and green spots represent 
genes that were repressed 
relative to the initial timepoint. 
In the arrays used to analyze 
the effects of thetuplA mu- 
tation and YAPl overexpres- 
sion. red spots represent 
genes virhose expression was 
irKTeased. arx3 green spots 
represent genes whose ex- 
pression was decreased by 
the genetic modification. Note 
that distinct sets of genes are 
induced and repressed in the 
different experiments. The 
complete images of each of 
these arrays can be viewed on 
the tntemet (73). Cell density 
as measured by optical densi- 
ty (OD) at 600 nm was used to 
measure the growth of the 
culture. 



Growth OD 0.14 



Growth OD 0.46 
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by mutations in each putative regulatory 
gene. As a test of this strategy, we arialyzcd 
the genomewide changes in gene expression 
that result from deleti n f Ae TUPl gene. 
Transcriptional repression of many genes by 
glucose requires die DNA-binding repressor 



Migl and is mediated by recruiting the tran- 
scriptional co-rcpressors Tupl and Cyc8/ 
Ssn6 (39). Tupl has also been implicated in 
repression of oxygen-regulated, mating-type- 
specific, and DNA^lamage-inducible genes 
(40). 
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Wild-typc yeast cclb and cclb bearing 
a deletion f the TUPl gene (tupl A) were 
grown in parallel cultures in rich medium 
containing glucose as the carbon «ource. 
Messenger RNA was isolated from expo- 
ncntially growing cclb from the two pop. 
ulations and used to prepare cDNA la- 
beled with Cy3 (green) and .Cy5 (red), 
respectively { / i ). The labeled probes were 
mixed and simultaneously hybridized t 
the microarray. Red spots on the microar- 
ray therefore represented genes whose 
transcription was induced in the tup J A 
strain, and thus presumably repressed by 
Tupl A representative section ofthe 
microarray (Fig. 2. bottom middle panel) 
illustrates that the genes whose expression 
was affected by the tup J A mutation, were, 
in general, distinct from those induced 
upon glucose exhaustion [complete images 
of all the arrays shown in Fig. 2 are avail- 
able on the Internet (13)]. Nevertheless, 
34 (10%) of the genes that were induced 
by a factor of at least 2 after the diauxic 
shift were similarly induced by deletion of 
TUP J, suggesting that these genes may be 
subject to TUP] -mediated repression by 
glucose. For example, SUCl the gene en- 
coding invertase, and all five hexose trans- 
porter genes that were induced during the 
course of the diauxic shift were similarly 
induced, in duplicate experiments, by the 
deletion of TUPl, 

The set of genes affected by Tupl in this 
experiment also included a-glucosidases, 
the mating-type-*pecific genes MFAJ and 

^l^l ^"1 damage-induciblc 
RNR2 and RNR4, as well as genes involved 
in flocculation and many genes of unknown 
function. The hybridization signal corre- 
sponding to expression of TUPJ itself was 
also severely reduced because of the (in- 
complete) deletion of the transcription unit 
in the tupl A strain, providing a positive 
control in the experiment (42). 

Many of the transcriptional targets of 
Tupl fell into sets of genes with related 
biochemical functions. For instance, al- 
though only about 3% of all yeast genes 
appeared to be TUP/ -repressed by a factor 
of more than 2 in duplicate experiments 
under these conditions, 6 of the 13 genes 
that have been implicated in flocculation 
(15) showed a reproducible increase in 
expression of at least twofold when TUPl 
was deleted. Another group of related 
genes that appeared to be subject to TUPl 
repression encodes the serine-rich cell 
wall mannoprotcins, such as Tipl and 
Tirl/Srpl which arc induced by cold 
shock and other stresses (43). and similar, 
sennc-poor proteins, the scripauperins 
(44). Messenger RNA leveb for 23 ofthe 
26 genes in this group were reproducibly 
elevated by at least 2.5.fold in the tuplA 
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Strain, and 18 of these genes were induced 
by more than sevenfold when TUPl was 
deleted. In contrast, none f 83 genes that 
could be classified as putative regulators f 
the cell divisi n cycle w re induced more 
than twofold by deletion of TUPL Thus 
despite the diversity of the regulatory sys-* 
terns that employ Tupl , most of the genes 
that tt regulates under these conditions 
tall into a limited number of distinct func- 
tional classes. 

Because the microarray allows us to 
monit r expression of nearly every gene in 
yeast, we can. in principle, use this ap- 
proach to identify all the transcriptional 
targets farcgulatoryproteinlikeTupl.lt 
is important to note, however, that in any 
single experiment of this kind we can only 
recognize those target genes that are nor- 
mally repressed (or induced) under the 
conditions of the experiment. For in- 
stance, the experiment described here an- 
alyzed a MAT a strain in which MFAJ 
and MFA2, the genes encoding the a- 
factor mating pheromone precursor, arc 
normally repressed. In the isogenic tup] A 
strain, these genes were inappropriately 
expressed, reflecting the role that Tupl 
plays in their repression. Had we instead 
canied out this experiment with a MATA 
strain (in which expression of MFAJ and 
MFA2 IS not repressed), it would not have 
been possible to conclude anything re- 
garding the role of Tupl in the repression 
of these genes. Conversely, we cannot dis- 
tinguish indirect effects of the chronic 
absence of Tupl in the mutant strain f^om 
effects directly attributable to its partici- 
pation in repressing the transcription of a 
gene. 

Another simple route to modulating the 
activity of a regulatory factor is to overex- 
prcss the gene that encodes it. YAPl en- 
codes a DNA-binding transcription factor 
belonging to the b-zip class of DNA-bind- 
ing proteins. Overexpression of YAPJ in 
yeast confers increased resisrance to hydro- 
gen peroxide, o-phenanthroline, heavy 
metals, and osmotic stress (45). We ana- 
lyi^d differential gene expression between a 
wild-type strain bearing a control plasmid 
and a strain with a plasmid expressing YAP J 
under the control of the strong GALl^lO 
promoter, both grown in galactose (that is 
a condition that induces YAPl overexpres-* 
sion). Complemenrary DNA f^om the con- 
trol and YAPl overexpressing strains, la- 
beled with Cy3 and Cy5. respectively, was 
prepared f^om mRNA isolated from the two 
snains and hybridized to the microarray. 
Thus, red spots on the array represent genes 
that were induced in the strain overexpress- 
ing YAPl. 

Of the 17 genes whose mRNA levels 
increased by more than threefold when 
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YAPi was overexpressed in this way. five 
bear homology to aryl-alcohol oxidoreduc- 

four of the genes m this set also belong to 
die general class of dehydrogenases/oxi- 
doreductases. Very little is knownlC 
Uie role of aryl-alcohol oxidoreductases in 

solated from ligninolytic fungi, in which 
they participate in coupled redox reac 
tions. oxidizing aromatic, and aliphatic 
uiuaturated alcohols to aldehydes wi^h the 
production of hydrogen peroxide (46. 47) 
TTie fact that a remarkable fraction of the 
targeu identified in this experiment be- 
long to Ae same small, fijnctional group of 
oxidoreductases suggesu that these genes ' 

Rq. 4. Coordhated reg- 
ulation of functionally re- 
lated genes. The ounces 
represent tfie average in- 
duction cr repression ra- 
tios for all the genes in 
each indicated group. 
The total number of 
genes in each group was 
as follows: ritx)soinal 
proteins. 112; translation 

elongation and initiation — . 

'P^^Z^^^ZS::^^^^^^ I^^O^^^genandtreha^syn- 
late-cycle enzymes. 24. reductase proteins. 19; and TCA- andg^ 
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metabolism. 
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(TTACTAA or TGACTAA) in d>e Z 
quences upstream of the target genes we 
Identified («). About twolittb of thJ 
genes that were induced by more than 
d^efold upon Yapl overexpression had 
one or more binding sites within 600 bases 
uFwtieam of the start codon (Table I), sug- 

lapl. The absence of canonical Yapl-bind- 
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ing sices upstream of the others may reflect 
an ability of Yapl to bind sites that differ 
from the canonical binding sites, perhaps in 
cooperation with ther factors, or less like- 
ly, may represent an indirect effect fYapl 
overcxprcssion, mediated by one or more 
intermediary factors. Yapl sites were found 
only four times in the corresponding region 
of an arbitrary set of 30 genes that were not 
differentially regulated by Yapl. 

Use of a DNA microarray to character- 
ize the transcriptional consequences of 
muutions affecting the activity of regula- 
tory molecules provides a simple and pow- 
erful approach to dissection and character- 
iiation of regulatory pathways and net- 



works. This strategy also has an important 
practical application in dnig screening. 
Mutations in specific genes encoding can- 
didate dnig targets can serve as surrogates 
for the ideal chemical inhibitor or modu- 
lator of their activity. DNA microarrays 
can be used to define the resulting signa- 
ture pattern of alterations in gene expres- 
sion, and then subsequently used in an 
assay to screen for compounds that repro- 
duce the desired signature pattern. 

DNA microarrays provide a simple and 
economical way to explore gene expres- 
sion patterns on a genomic scale. The 
hurdles to extending this approach to any 
other organism are minor. The equipment 
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required for fabricating and using DNA 
microarrays (9) consists f components 
that were chosen for their modest cost and 
simplicity. It was feasible f r a small group 
to accomplish the amplification of more 
than 6000 genes in about 4 months and. 
once the amplified gene sequences were in 
hand, only 2 days were required to print a 
set of 110 microarrays of 6400 elements 
each. Probe preparation, hybridization, 
and fluorescent imaging are also simple 
procedures. Even conceptually simple ex- 
periments, as we described here, can yield 
vast amounts of information. The value of 
the information from each experiment of 
this kind will progressively increase as 
more is learned about the functions of 
each gene and as additional experiments 
define the global changes in gene expres- 
sion m diverse other natural processes and 
genetic perturbations. Perhaps the greatest 
challerige now is to develop efficient 
methods for organizing, distributing, inter- 
preting, and extracting insights from the 
large volumes of data these experiments 
will provide. 
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We describe here a method for drug target validation and identification of secondary drug tar- 
get effecU based on genome-wide gene expression patterrw. The method is demonstrated by 
several experiments, including treatment of yeast mutant strains defective in calctneurtn, im- 
munophilins or other genes with the immunosuppressants cyclosporin A or FK506. Presence or 
absence of the characteristic drug ^signature' pattern of altered gene expression in drug-treated 
cells with a mutation in the gene encoding a putative target established whether that target was 
required to generate the drug signature. Drug dependent effects were seen in Hargetless' cells, 
showing that FK506 affects additional pathways independent of calcineurin and the tny 
munophilins. The described method permiu the direct connrmation of drug targets and recog- 
nition of drug-dependent changes in gene expression that are modulated through pathways 
distinct from the drug's intended target. Such a method may prove useful in improving the effi- 
ciency of drug development programs. 
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Good drugs are potent and specific: that is. they must have 
strong effects on a specific biological pathway and minimal ef- 
fects on all other pathways. Confirmation that a compound in- 
hibits the intended target (drug target validation) and the 
identification of undesirable secondary efTects are among the 
main challenges in developing new drugs. Comprehensive 
methods that enable researchers to determine which genes or 
activities are affected by a given drug might improve the effi- 
ciency of the drug discovery process by quickly identifying po- 
tential protein targets, or by accelerating the identification of 
compounds likely to be toxic. DNA microarray technology, 
which permits simultaneous measurement of the expression 
levels of thousands of genes, provides a comprehensive frame- 
work to determine how a compound affects cellular metabolism 
and regulation on a genomic scale' *'. DNA microarrays that 
contain essentially every open reading frame (ORF) in the 
Saccharomyces cerevisiae genome have already been used success- 
fully to explore the changes in gene expression that accompany 
large changes in cellular metabolism or cell cycle progression '''°. 

In the modem drug discovery paradigm, which typically be- 
gins with the selection of a single molecular target, the ideal in- 
hibitory drug is one that inhibits a single gene product so 
completely and so specifically that it is as If the gene product 
were absent. Treating cells with such a drug should induce 
changes in gene expression very similar to those resulting from 
deleting the gene encoding the drug's target. Here we have com- 
pared the genome-wide effects on gene expression that result 
from deletions of various genes in the budding yeast 5. cerevisiae 
to the effects on gene expression thai result from treatment 



with known inhibitors of those gene products. Using the cal- 
cineurin signaling pathway as a model system, we tested an ap- 
proach that permits identification of genes that encode proteins 
specifically involved in pathways affected by a drug. The FK506 
characteristic pattern, or 'signature', of altered gene expression 
was not observed in mutant cells lacking proteins inhibited by 
FK506 (for example, a calcineurin or FK506-binding-protein 
mutant strain), but was observed in mutants deleted for genes 
in pathways unrelated to FK506 action (for example, a cy- 
clophilin mutant strain). Conversely, the cyclosporin A (CsA) 
signature was not observed in CsA-treated calcineurin or cy- 
clophilin mutant strains, but was seen in an FK506-binding-pro- 
tein mutant strain treated with CsA. The method also 
demonstrates that FK506. a clinically used immunosuppressant, 
has off-target' effects that are independent of its binding to im- 
munophilins. Thus, the approach we describe may provide a 
way to identify the pathways altered by a drug and to detect 
drug effects mediated through unintended targets. 

Null mutants phenocopy drug-treated cells on a genomic scale 
To lest whether a null mutation in a drug target serves as a 
model of an ideal inhibitory drug, we examined the effects on 
gene expression associated with pharmacological or genetic in- 
hibition of calcineurin function. Calcineurin is a highly con- 
served calcium- and calmodulin-activated serine/threonine 
protein phosphatase implicated in diverse processes dependent 
on calcium signaling'^ '\ In budding yeast, calcineurin is re- 
quired for intracellular ion homeostasis'*, for adaptation to pro- 
longed mating pheromone treatment" and in the regulation of 



NATURE MEOtCINt • VOLUME 4 • NUMBER 11 • NOVEMBER 1998 



1293 



ARTICLES 



as 1998 Nafaife Amerfea he. . http://me<Ii6ne.natufe.( 



" '""^on^m of the ealeineurtn signaling pathway mediated 
t» FK50B -Id ej«l«porin A (CA). Celdne-ln «^ h eompoid oT a «. 
alytie suUmK (caldneuin A. encoded inyea« by the C/V>»» andoicgen* 
and uUum^ing regulatoy iUbuniu calmodulin (CMD) and ealcir«^ B 

^^t^^^l^lT- «Pedto«y bind and wSSHhe 

pepUdyHxoIre bomerase activity of their respective immunophflins. FK506 

hiWion Drug-«nmunophBin complexes bind and inhant the calcium, and 

calmodulmtimulated phosphatase caldneuin. Among thevibstfMe of LT 
CMWurin are transcriptiorul activators that act to modulate gene expression 



the onset of mitosis". In mammals, calcineurin has been impli- 
cated in T<ell activation", in apoptosis". in cardiac hypertro- 
phy" and in the transition from short-term to long-term 
memory". In both organisms, calcineurin acUvity is inhibited 
by FK506 and CsA. immunosuppressant drugs whose effects on 
calcineurin are mediated through families of intracellular recep- 
tor proteins called immunophilins'"" (Fig. 1). To assess the ef^ 
fecB of pharmacologic inhibition of calcineurin wild-type 5. 
cerevisiae was grown to early logarithmic phase in the presence 
g or absence of FK506 or CsA. Isogenic cells, from which the 
g genes encoding the catalytic subunits of calcineurin {CNAl and 

1 CNAZ) had been deleted" (referred to as the a,a or calcineurin 
^ mutant), were grown in parallel. In the absence of the drua 

2 Fluorescently-labeled cDNA was prepared by reve„e transcrip-" 
I t ion of polyA- RNA in the presence of Cy3- or CyS-deoxynui 
I cleotide triphosphates and then hybridized to a microarrav 
I containing more than 6.000 DNA probes represenUng 97% of 
. the Icnown or predicted ORFs in the yeast genome 
g Simultaneous hybridization of CyS-labeled cDNA from mock' 
„ treated ceUs and Cy3-labeled cDNA from cells treated with 1 
I ng/ml FK506 allowed the effect of drug treatment on mRNA lev- 
I els of each ORF to be detemiined (Fig. 2a and b and data not 
i ^"""^'y- °f 'f'^ calcineurin mutations on the 
I mRNA levels of each gene were assessed by simultaneous hy- 
^ bridization of CyS-labeled cDNA from wild-type cells and Cyi 
g labeled cDNA from the calcineurin mutant stiain(Fig 2ci For 
„ each comparison of this kind, reponed expression ratios are the 
« average of at least two hybridizations in which the Cy3 and Cv5 

fluors were reversed to remove biases that may be introduced by 
gene-specific differences in incorporaUon of the two lluoR 
(data not shown). 

Treatment with FKS06 in these growth conditions resulted in 
a signature pattern of altered gene expression in which mRNA 
levels of 36 ORFs changed by more than twofold 
(http://www.rosetta.org). A very similar pattern of altered eene 
expression was observed when the calcineurin mutant strain 
WM compared to wild-type cells. Comparison of the changes in 
mRNA expression of each gene resulting from treatment of 
wi d-type cells with FK506 with mRNA expression changes re- 
suiting from deletion of the calcineurin genes showed the con- 
siderable Similarity of the global transcript alterations in 
response to the two penurbations (Fig. 2M). Quantification of 
his similarity using the correlation coefficient (p) showed 
large correlations between the FK506 treatment signature and 
the calcmeurin deletion signature (p - 0.75 « 0.03) as well as 
the CsA treatment signature (p . 0.94±0.O2), but not with a 
randomly selected deletion mutant strain (deleted for the 
KHfOZJCgene: p . -0.07 ± 0.04: Fig. 2e). The FK506 treatment 
signature was also compared with those of more than 40 other 
deletion mutant strains or drug-treatments thought to affect 
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unrelated pathways, and none had staUsUcally significant cor- 
rections. These data establish that genetic dlsruptir: ca N 
cineunn function provides a close and spedllc phenocopy of 
treatment with FK506 or CsA. i^i'enocopy oi 

naJi,r'V'"''r"^''« ' **"8le example, we also com- 

S 3 A?w?h° h'^^r™ •'/^"' '^ 3-aminotri- 
azole (3-AT) with the effects of deletion of the HIS3 gene HIS3 

encodes imidazoleglycerol phosphate dehydrataTrhich^f 

L^C' , " ^'"Petltlve inhibitor of this enzyn^ that tn2 

ge« a large transcriptional amino-add stan-auTresponse" 
Microarray analysis of wild-type and Isogenic /Us3-deSdent 
"P-'«l l-^e genome- Wide rra«2™ 
onal responses (involving more than 1.000 ORFs) r«ulUn^ 
from treatment with 3-AT (Fig. 3a) or from H153 deletion Jig' 
andtSvT'''' °f 'he 3-AT tmatment slgnatu^ 

76 "o 02,T f '""^^ ' *^8h level of correlation 
smaS ch.no experienced 
small changes in expression level (Fig. 36). As a negative control 

S./rr '""^ "-^^ ''^nature or X' 

his3 mutant signature and the calcineurin mutant strain we^ 
not statistically significant (p = 0.09 * 0.06 and -0 01 7(^ 04 ^ 
spectlveiy). That both the calcineurin/rK506 and the AlsW-AT 
comparisons were highly correlated indicates that in many cas« 

sembles the expression profile of wild-type cells treated with 7n 
inhibitor of that gene s product. 



ZllT '^^9« «W«ion With deletion mutants 

Because pharmacological inhibition of different targets might 
g ve similar or identical expression profiles, simple cSmpaTS 
of drug signatures to mutant signatures is unlikely to unambigu- 
ously identify a drug's target. To overcome this limitation, an 
additiona decoder' step is used. We first compare the expres 

S from': " " I'-r '"^■""'^'^ '° the'expreslirn p7t 
nies from a panel of genetic mutant strains, using a correlation 
coefficient metric. Mutant strains whose expression profile is 

subjected to drug treatment. generaUng the drug signature in 
the mutant strain (that is. the mutant drug signature) If the 
mutated gene encodes a protein involved in' pa'thway af^t^d 

st'rJZStype'^lir '""^ '""^ ''^^ 
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Fig. 2 Expression proHles from 
FK506-Ue8ted wild-type (wt) 
cells and a calcineurin-disrupUon 
mutant strain share a genome- 
wide correlation. DNA miaoarray 
analysis showing changes in gene 
expression resulting from FK506 
treatment {a and b) or from ge- 
netic disruption of genes encod- 
ing catcirieurin (c). s. Pseudo- 
color image of the results of si- 
multaneous hybridization of Cy5- 
lat>eled cONA (red) from 
mock-treated strain R563 and Cy3-l8beled cONA 
(green) from strain R563 ueated with 1 ^g/ml FK506. 
b. Enlarged view of the boxed area in a. Arrowheads in- 
dicate specific ORFs induced or repressed. €, Pseudo- 
color image of the results of simultaneous hybridization 
of CyS-labeted cONA (red) from strain R563 and Cy3- 
labeled cDNA (green) from strain MCY300 (deleted for 
the CNA1,CNA2 catalytic subunits of calcineurin). 
Arrows indicate specific ORFs induced or repressed, d, 
The logto of the expression ratio for each ORF derived 
from the FK506 treatment hybridizations is plotted ver- 
sus the logto of the expression ratio in the calcineurin 
mutant hybridizations. ORFs that were induced or re- 
pressed in t>oth experiments are shown as green and 
red dou. 'especUvely. .. The toQ-. of the expression ratio for each ORF de- of the expression ratio in the yerOllc mutant hybridizations. No ORFs 
nved from the FKS06 treatment hylKid.zauons is plotted versus the log,, were induced or repressed in both experiments. 
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To iJlustrate this, we treated the his3 niutant strain with 3- 
AT. The signature pattern of altered gene expression resulting 
from treatment of the mutant strain with 3-AT was much less 
complex than that of the 3-AT signature in wild-type cells (Fig. 
4). This is seen simply by examining plots of mean intensity of 
the hybridization signal (which approximately reflects level of 
expression) versus the expression ratio for each ORF (Fig. 4). 
Genes that were expressed at higher or lower levels in 3-AT 
treated cells or in his3 mutant cells are shown as red and green 
dots, respectively. We analyzed the 3-AT signature in wild-type 
(Fig. 4a) and his3 mutant cells (Fig. 4c). as well as the his3 mu- 
tant strain signature (Fig. 4b). Whereas histidine limitation in- 
duced by 3-AT induced more than 1,000 transcription- 1 eve! 
changes in the wild-type strain, few or no transcript level 
changes were induced by treatment of the /iis3-deleiion strain 
with 3-AT. This indicates that with the growth conditions used, 
essentially all of the effects of 3-AT depend on or are mediated 
through the HIS3 gene product. 

Applying this approach to the calcineurin signaling pathway 
showed the specificity of the method. The calcineurin mutant 
strain and strains with deletions in the genes encoding the 
most abundant immunophilins in yeast" {CPHl and FPRl) 
were treated with either FK506 or CsA to determine the profiles 



Table 1 



Signature correlation of expression ratios as a result of FK506 
treatment in various mutant strains 





vytld-type 


cna 


fpr7 


cna fpri 


cpM 




4/-^K506 


♦/-FK506 


1/-FK506 


4/-FK506 


4/-FK506 


wild-type 












4/- FK506 


0.93 ± 0.04 


-0.01 ± 0,07 


-0.23 1 0.07 


0.12 ±0.07 


0.79 1 0.03 



^fy%*oiWK wwiicio^iwfi ••fvws liie au^>n.« w. uic r r% wo signaiurc speciicaxy m I He caicmeurin {cna) and fpr) 
(major f K506 binding protein) detelion mutants, cna represents the mutant with deletions of the catalytic sub 
uniu of calcineurin, CNA1 BntS CNA2. The correlation coefficient reponed in the first column represents the cor 
relation between two pairs of hybridisations from independent wild-type FK506 experiments 



of altered gene expression resulting from drug treatment of the 
mutant cells (that is. mutant +/- drug). We compared the drug 
signatures in the mutants to the wild-type drug signature using 
the correlation coefficient metric (Table 1). Although the signa- 
ture generated by treatment of wild-type cells with FK506 was 
highly correlated to the calcineurin mutant strain signature (p 
= 0.75 ± 0.03). it bore no similarity to the profile after treat- 
ment of the calcineurin mutant strain with FK506 (p - -0.01 ± 
0.07). This indicates that FK506 was unable to elicit its normal 
transcriptional response in the calcineurin mutant strain. 
Likewise, treatment of the fpr] mutant strain with FK506 
elicited an expression profile that was not correlated to the 
FK506 signature in the wild-type strain (p « -0.23 ± 0.07). indi- 
cating that the FPR] gene product is likely to be involved in the 
pathway affected by FK506. The same was toie for the cna fprJ 
mutant strain. In contrast, treatment of the cphl mutant strain 
with FK506 generated an expression profile highly correlated 
with the wild-type FK506 expression profile (p « 0.79 ± 0.03). 
indicating the cph] mutation did not block the mode of action 
of FK506 and thus is not directly involved in the pathway af- 
fected by FK506. We tabulated the change in expression in re- 
sponse to FK506 in different mutant strains for all ORFs with 
expression ratios greater than 1.8 in FKSOS-treated cells or in 
the calcineurin mutant strain (Fig. 5a).The 
calcineurin mutant strain signature and the 
FK506 responses in wild-type and the cph] 
mutant strain are similar, and there are no 
transcript-level changes (seen in black) for 
treatment of the calcineurin. fpr] and cna 
fpr] mutant strains with FK506 (Fig. 5a). 

Similar experiments and analyses with CsA 
provided further validation of this approach. 
The expression profile elicited by treatment 
of wild-type cells with CsA was highly corre- 
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Fig. 3 Expression profiles 
from a his3 mutant strain 
and wtld-type (wt) cells 
treated with 3*AT share a 
genome-wide correlation. 
ONA microarray analysis 
showing changes in gene 
expression resulting from 3- 
AT ueatment (a) or from ge- 
netic disruption of the HtS3 
gene «. Pseudo-color 
image of the results of simul- 
taneous hybridization of 

CyS-labeled cONA (red) from mock-ueated wild-type strain R491 and 
Cy3-labeled cDNA (green) from suain R491 treated with 10 mM 3.AT. 
b. Plot of the log,o of the expression ratio for each ORF derived from the 
3-AT ueatment hytvtdizatlons is plotted versus the log„ of the expression 
ratio in the hisB mutant hybridizations. ORFs that were induced or re- 
pressed in l>oth experiments are shown as green and red dots, respec- 
tively. The correlation of expression ratios applies not only to genes with 
large expression ratios (for example, CHA 7 and ARC1). but also extends to 
genes with expression ratios less than 2 (for example, im and CPHI), 
il VI is induced 1 .9-fold and 1 .5-fold, and CPH1 is dov^^nregutated 1 .9-fold 



AMCJ 





w( vs. MOmuucion 




tog« (R/G) haJ muution 



and 1 .7.fold. in cells ueated with B-AT and h'a3 mutant cells, respectively. 
Two ORFs do not fall on the line x - y. The leftmost point is the HIS3 data 
point, which is induced by 3-AT ueatment but which is not absent from 
me his3 mutant strain. The other point is YOR203w. Both data points are 
labeled HiS3 because hybridization to YOfi2Q3w i% most likely due to HIS3 
mRNA, as KOW03*v overlaps the HIS3 open reading frame. «, Pseudo- 
color image of the results of simultaneous hybridization of CyS-labeled 
cDNA (red) from wild-type suain R491 and Cy3-labeled cDNA (green) 
from strain RT226. deleted for the HIS3 gene. Arrowheads indicate spe- 
ciflc ORFs induced or repressed. 
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laied to the profile elicited by mutation of the calcineurin genes 
(p « 0.71 ± 0.04). but did not correlate v^ith the expression pro- 
file resulting from treatment of the calcineurin mutant strain 
with CsA (p - -0.05 ± 0.07; Table 2). indicating that the genetic 
deletion of calcineurin interfered with the ability of CsA to 
elicit its normal transcriptional response. Likewise, the CsA sig- 
nature was essentially absent in CsA-ireaied cphl mutant cells, 
and the expression profile of CsA-ireated cphl mutant cells cor- 
related poorly to that of CsA-treated wild-type cells (p = 0.18 ± 
0.07). Thus, the CPHi gene product was required for the CsA re- 
sponse seen in wild-type cells. Conversely, treatment of fprl 
mutant cells with CsA resulted in an expression pattern very 
similar to the profile of CsA-treated wild-type cells (p « 0.77 i 
0.03). indicating that FPRl was not necessary for the CsA-medi- 
ated effects. Analysis of individual ORFs affected by CsA and 
their expression ratios over the entire set of experiments con- 
firmed that CPHI and the genes encoding calcineurin. but not 



FPRl, are necessary for the wild-type CsA response (Fig. 56). The 
observation that the profiles resulting from FK506 or CsA drug 
treatment are similar to that of the calcineurin deletion mutant 
strain might allow the prediction that calcineurin was involved 
in the pathway affected by these drugs. But because the expres- 
sion profile of the fprl mutant strain did not bear a strong simi- 
larity to the wild-type drug expression profile for FK506. it is 
obvious that the drug treatment of the mutant strains was nec- 
essary to identify Fprl. but not Cphl. as a potential FK506 drug 
target, in the same way. the decoder' strategy was necessary to 
identify Cphl. but not Fprl, as a potential drug target for CsA. 

'Decoder' approach can identify secondary drug effects 
For a drug that has a single biochemical target, the strategy out- 
lined above may be useful in target validation. In many cases, 
however, a compound may affect multiple pathways and elicit 
a very complex signature. Decoding' such a complex signature 
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Fig. 4 Treatment of the his3 mutant strain with 3-AT shows nearly com- 
plete loss of 3-AT signature. A plot of the log„ of the mean intensity of hy- 
bridization for each ORF versus the log,o of its expression ratio for each 
experiment is shown next to a pseudo<olor image of a representative 
portion of the microarray. ORFs that are induced or repressed at the 95% 
confidence level are shown in green and red, respectively, s. Expression 
profile from treatment of the wild-type (wt) strain with 3-AT, CyS-labeled 
cDNA (red) from mock-treated strain R491 and Cy3-labeled cDNA 
(green) from sUain R491 treated with 10 mM 3-AT. b. Expression profile 



l-og,o (intensity) 



from the his3 deletion strain. CyS-labeled cDNA (red) from suain R491 
and Cy3-labeled cDNA (green) from strain R1226. deleted for the HiS3 
gene. «. Expression profile of treatment of the his3 deletion strain with 3- 
AT. Cy3-labeled cDNA (red) from n/sJ-deleied strain R1226 and CyS-la- 
beled cDNA (green) from strain R1226 treated with 10 mM 3-AT. 
Arrowheads indicate the DNA probe and data point corresponding to the 
HiS3 gene. The blue dashed line represents the threshold below which er- 
rors tend to increase rapidly because spot intensities are not sufficiently 
above background intensity. 
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Signature correlati n t expression ratios as a result of Csa' 
treatment in various mutant strains 



wild-type 
♦/-CsA 



Mfld-type 
*/-CsA 



ens 
♦/-CsA 



♦/-CsA 



cnacphl 
♦/-CsA 



0.94 1 0.04 -0.05 1.07 0.77. 0 03 -0.11 1 0.07 



♦/-CsA 
0.181 0.07 



g Sinin: 
FK&06: 



Slgn»ttirt eoiTelation shorn the absenet of the CiA linnalun iMeifi»ii. » .k ' ' 

eineurin. CA/4, ,n6 CNA2. Tl« co,Tel«i,n co^,^„;^^ ' *'!"°™ »» '"buniu of al. 



into the effects mediated through the intended target (the on 
target Signature-) and those mediated through unintended tar- 
gets (the off-taiget signature) might be usefuj in evaluating a 
compound-s specindty. Our decoder" strategy is based on the 
prem^ that off-targef signature should be insensitive to the 
genetic disrupUon of the primary target 

»7°„J?""'"' '^'I^'*'" "PP™^^^' ""Id identify 

an ofT-targef profile, we looked for a drug-responsive gene 
whose expression is insensiUve to deletion of the primao^ tar- 
get. To increase the likelihood of observing such gene7 the 
same strains described in Tables 1 and 2 were treated '^l 
higher concentrauons (50 Kg/ml) of FK506. This led to a much 

.ZlrT"!,?.'^''"**'" "'""'^ ^"^-'yP^ indicating 
that at tWs higher concentration. FK506 was inhibiting or acti 

fS f h Of^f* 'his expanded 

FK506-induced expression profile were not affected by the cal- 
eineurin. cphl or fprl mutations, as drug treatment of these mu- 
tant strains did not block their presence in the FK?06 
expression signature (Fig. 6). This indicates that FK506 was trie 
gering changes in transcript levels of many genes through path 
ways mdependent of calcineurin. CPHl and FPR,. Many of the 
upregulated ORFs in the off-targef pathway were genes re 
ported to be regulated by the transcriptional activator Gcn4 
24). In some strains, a reporter gene under CCN4 control 
was induced in response to FK506 treatment". To deterge 
whether GCN4 is Involved in this pathway that is indepl^ "t 
of calcineurin. CPH, and FPRl. we analyz^ the effects of"rea" 
men, with high-dose FK506 on global gene expression n a 
stram with a CCN4 deletion (Fig. 6). Of the 41 ORFs wUh cal 
cmeurin-independent expression ratios greater than 4 32 were 
not mduced in the^cn4 mutant, indicating that their induction 
by FK506 was CCyV4-dependem. Not all CCN4-regula,ed genes 
were induced by FK506. This FK506-induced subset of icM- 
regulated genes may be those most sensiUve to subtle changes 
FKsnfi Jr • °^P"'^''P* «g"l«ory circuits prevent 

FK506 a«ivat.on of some CC/V^-regulated genes. Seven of the 
remaining nine ORFs induced by FK506 were independent of 

Fig. 5 Response of FK506 and CsA signature genes in strains with deletions 
.n d-rferent genes. Genes with expression raUo, greate, than a trc^orTl e ?n 

deft SKte) and their expression ratios in the indicated sJain are shZon ^ 

'T'.^TT^ s^nsturegenesare in the frs, two columns. A m^a 
FK506 signature genes have expression ratios near unity in deleUon strain 
r P«'^.'"'«'«« *^ «S06 (calcineuin. /pr; and cn^f^TZ 
unt^) but not .n deletion strains in unrelated pathways ^7) * CaSneur^n 
icna) mutant and CsA treatment signaiurVgenw a^in mT r7,r, 
columns. Almost all CSA Signature genLaveex'i^^iorraL^^^^^^^^ 
de^8t«n strains involved in pathways affeaed by CsA (calcineurin L^Ta^nd 
cna cpM muuints) but not in deletion suains in unrelated pathways^,) 
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both the calcineurin and CCN4 pathways The 
simplest explanauon is that FK506 inhibits or 
activates additional pathways. Members of this 
class include SNQ2 and PDRS. genes that eiv 
c «dnjg efflux pumps with structural homol- 
ogy to mammalian mulUple drug resistance 
proteins". FK506 may interact directly with 
PdrS to inhibit its function". Our results indi- 
cate that treatment with FK506 leads to four- 

fold-to-sixfold induction of PDRS mRNA levels 

YORl. another gene that can confer drug resis- 

FKSOfi A ^ threefold-to-fourfold by 

FK506. Thus, drug treatment of strains with mutations in the 

Ey TcondaTd " '^'^""'^"8 effects medVteS 

by secondary drug targets, including the nature and extent of 

SbrthX^"' "^"""^ '-""-^^^ 

We describe here a method for drug target validation and the 
Identification of secondary drug target effects that uses DNA mi 
c oarrays to survey the effects of drugs on global gene exp^- 

ZbUioTof T""^'^' '"^^ phUa oTog" 

inhibition of gene function can result in extremely simL 

changes in gene expression. We also demonstrated that onTcan 

confirm a potential drug urget by treating a deletion mutant 

defective in the gene encoding the puutive target. D^ig S 

^roce n """" i" Path^a^ or 

processes directly or indirectly affected by the drug bore littL o 



ens 



cphl 



fp^^ cna cnafpri 




Fold repression 



FoW induction 



1297 



Stnin: 
FKSOe: 



enatpri 



E 

8 

9> 



3 

•c 

E 
< 

3 

to 
Z 




Fold repression 
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no similarity to the wild-type drug expression profile. In con- 
trast, drug-mediated signatures from strains with mutaUons in 
genes involved in pathways unrelated to the drugs action 
showed extensive similarity to the wild-type drug signature By 
applying this approach to a drog that affects multiple pathway 
(FK506), we were able to decode a complex signature into com- 
ponent parts, including the identification of an off-targef sie 
nature that was mediated through pathways independent of 
calcineurin or the Fprl immunophilin. 



Discussion 

It is well-established that high-throughput biochemical screen- 
mg can identify potent inhibitory compounds against a given 
target. The -decoder' approach described here complements 
this process by evaluating the equally important property of 
specificity: the tendency of a compound to inhibit pathways 
other than that of its intended target. The ability to observe 
such -ofT-targef effects will likely be useful in several ways 
Profiling compounds with known toxicities will allow the de- 
velopment of a database of expression changes associated with 
particular toxicities. Recognition of potential toxicities in the 
•off-targef signatures of otherwise promising compounds then 
may allow earlier Identification of those likely to fall in clinical 
trials. Comparing the extent and peculiarities of off-targef sig- 
natures of promising drug candiates could provide a new way 
to group compounds by their effects on secondary pathways 
even before those effects are understood. This may prove to be 
an alternative. potenUally more effective, way to select com- 
pounds for animal and clinical trials. Some drugs are more ef 
fective against a related protein than against the originally 
intended target. Sildenafil (Viagra~), for example, was iniUaliy 
developed as a phosphodiesterase inhibitor to control cardiac 
contractility, but was found to be highly specific for phospho- 
diesterase 5. an isozyme whose inhibition overcomes defects in 
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Rg. 6 ««Po«eofFK506 5ign.ii«,gene,instr,im^deHij^ 
.n different ger«. Genes witf) expression ratios greater thanate^ 

U« rKl«ted stram .re Shown m the green Onduetion)-red (reprj. 
«on) cotor scale. The genes have been divided Into d««lia^ 
spending to these expected behaviors: -Cmdependenf oenes 
r^nd to f K506 (50 ^g/ml) except when eitte^aSlWgeSa^ 
"CCA/^-dependenf genes respond ioBM 
except When CCN4 is deleted. Tt«e genes stin resporrt to FK506 
when cataneurin genes or FPR1 or CPH1 are deleted: that is. their re- 
sponses are not mediated by calcineuria Cphl. or Fprl. CNA- and 
GC/V4-ir,dependenf genes respond to FK506 in all deletion straim 
tested. A complex behavior" dass is provided for those gen^ that did 
no, match me model of FK506 response mediated^gh «T 
cineurin or Fpr1 or separately through Gcn4. 

penile erecUon. It is possible that application of the 'de- 
coder to other compounds may show that they too have a 
potent activity against a target disUnct from their in- 
tended target. 

The ability to decode dnig effects is dependent on the 
availability of functionally targedess" cells. In yeast this 
IS being achieved by systematlcallv disrupting each yeast 
gene ISaccharomyces Deletion Consortium: http //se- 
quence-www.$tanford.edu/group/yeast deleUon pro- 
ject/deletion.html). Efforts are underway to obtain 
— expression profiles from each deletion mutant strain 
Determining signatures resulting from inactivation of es' 
senuaj genes presents a unique problem, but it may be 
possible to do so by examining heterozygotes or by using a con- 
tronable promoter to reduce expression of the essential gene 
Although it is already feasible to test several compounds in 
dozens of yeast strains, another challenge for the "decoder' 
strategy will be the efl^icient selection of the mutants with dele- 
tions in genes most likely to encode the intended drug tarRet 
The signature correlation plots described are one metric that 
could be used as part of that selection process, but others need 
to be explored. Applying the decoder' to mammalian cells pre- 
serits additional challenges. It is considerably more dlfficulrto 
.solate functionally targetless' cells. Strategies involving tltrat- 
able promoters, known specific inhibitors. anU-sense RNAs rl- 
bozyriies. and methods of targeting specific proteins for 
degradation are possible and should be tested. Another limita- 
tjon is that not all cell types express the same set of genes and 
therefore off-targef effects may be different in different cell 
types. In addition, applying the decoder' to human cells will 
also require technical improvements that allow expression pro- 
.n mg from a small number of cells. Even the broader question 
of whether the insensitivity of off-urgef signatures to the dis- 
ruption of the main target is the exception or the rule can only 
be answered by the accumulation of more data. Barkai and 
Le.bler. however, have argued in favor of robustness of biologi- 
cal networks, indicating that drug perturbations ( off-tareef 
signatures) may be robust even when the system is subjected to 
another perturbation (such as a genetic disruption) (ref. 28) 
Many practical developments will be necessary If the 'decoder- 
concept is to be broadly applied. 

Expression arrays have been used mainly as an initial screen 
for genes induced in a particular tissue or process of interest by 
focusing on genes with large expression ratios. We have 
found, however, that effort to refine experimental protocols 
and repeat experiments increases the reliability of the data and 
permits new applications. For example. It provides a larger set 
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Table 3 Yeast strains used 



Strain 

YPH499 

ft563 

R558 

R567 

MCY300 

R132 

R133 

R5S9 

BY4719 

BY4738 

R491 

BY4728 

BY4729 

R1226 



Relevant genotype 

Mats ura3'S2fy52'801 adeZ^lOl trpUA63his3'A200ieu2'^1 

Mata ura3-S2 fy%2-801 ade2-101 trpUA63 ha3-A200 ieu2-A 7 his3::HIS3 

Mata ura3'52fys2-801 ade2'101 trpUA63his3'A200leu2'A1 fpr1"HiS3 

Mata urs3'S2fy52'801 ade2-Wl trp1-A63ha3-A200leu2-61 cph1' HIS3 

Mata ura3-52fys2^01 ade2'W1 trpUA63his3-A200ieu2^61 cnaU1::hisCcna2AV H!S3 

Mata ura3-S2lyz2j801 ade2-101 trp1-A63h!s3-A200ieu2^^i cnaUV:hisGcna2AV^^^^^ 

Mata ura3 f ^:B0^ B<le2-101 Up1^A63his3-A200ieu2^Al cnaU1::NsGcna2A^^^^^^^ 

Mate ura3.52fy^.801 ade2^10l tfpUA63 his3-6200 teu2^Ai his3::HlS3ocn4-UU2 

Matatrp1-A63ura3-A0 

Mata trp1-A63 uraS-AO 

Mata/aBUl^Q XBU73B 

Mata his3'A200 trp1'A63 ura3'A0 

Mata his3-A 200trpUA63 ura3'A0 

Matfl/a BY4728 XBY4729 



Reference 
(34) 

(this study) 
(this study) 
(this study) 
(21) 

(this study) 
(this study) 
(this study) 
(35) 
(35) 

(this study) 

(35) 

(35) 

(this study) 



of genes at higher confidence levels that serve as a more 
unique signature for a given protein perturbation. In addition, 

g it allows subtle signatures to be detected, when, for example, a 

8 protein is only partially inhibited. This may enable clinical 

I monitoring of small changes in protein function in disease or 

1 toxicity states before they could otherwise be detected, 

g Because the functions of many genes detected on transcript ar- 

S rays are known, these microarrays are powerful tools that pro- 

I vide detailed information about a cell s physiology. For 

^ example, changes In the flux through a metabolic pathway are 

g reflected in transcriptional changes in genes in the path way \ 

^ Furthermore, it may be possible to indirectly measure protein 

c activity levels from expression profiling data (S.F.. et aL, un- 

S published data). Thus, although the eventual development of 

I genomic methods allowing the direct measurement of all cel- 

< lular protein levels will be an important achievement, tran- 

I script array technology offers an immediate and robust means 

I of evaluating the effects of various treatments on gene expres- 

» sion and protein function. 

Methods 

Construction, growth and drug treatment of yeast strains. The strains 
used in this study (Table 3) were constructed by standard techniques". 
To construct sua in R559. strain RS63 was transformed to Leu* with plas- 
mid pM12 digested by Sa/I and M;uI (provided by A. Hinnebusch and T. 
Dever). Strains R1 32 and Rl 33 were constructed by Uansforming the bac^ 
terlal kanamycin resistance cassette*" flanked by genomic DNA from the 
CPH1 and fPRI loci, respectively, and selecting for G418-resisiant 
colonies. For experiments with FK506, cells were grown for three genera- 
tions to a density of 1 x 10' cells/ml in YAPD medium (YPD plus 0.004% 
adenine) supplemented with 10 mM calcium chloride as described". 
Where indicated. FK506 was added to a final concentration of 1 pg/ml 
0.5 h after inoculation of the culture or to 50 jig/ml 1 h before celts were 
collected. CsA was used at a final corKentraiion of 50 ng/ml. Cells were 
broken by standard procedures" with the following modifications: Cell 
pellets were resuspended in breaking buffer {0.2 M Tris HCI pH 7.6. 0.5 M 
NaCI, 10 mM EOTA. 1% SDS), vortexed for 2 min on a VWR multi-iube 
vortexer at setting 8 In the presence of 60% glass beads (425-600 jim 
mesh; Sigma) and phenol.chloroform (50:50, volume/volume). After sep- 
aratton of the phases, the aqueous phase was re-exiracted and ethanol- 
precipitated. Poly A* RNA was isolated by two sequential 
chromatographic purificaiions over oligo dT cellulose (New England 
6iolabs. Beverly. Massachusetts) using established protocols". 

For experiments using S-AT. wild-type or his3/hls3 cells were grown to 
early logarithmic phase in SC medium, pelleted and resuspended in SC 
medium lacking hisiidine for 1 hr in the presence or absence of 10 mM 3- 



AT. as indicated. Cells were harvested and mRNA isolated as above 
FK506 was obtained from the Swedish HospiUI Pharmacy (Seattle, 
Washington) and purified to homogeneity by ethyl acetate exUacUon by 
i. Simon (Fred Hutchinson Cancer Research Center. Seattle. Washington). 
CsA was obtained from Alexis Biochemicals (San Diego, California)- 3.AT 
was from Sigma. 

Preparation and hybridization of the labeled sample. Fluorescently-la- 
beled cDNA was prepared, purified and hybridized essentially as de- 
scribed'. Cy3. or CyS-dUTP (Amersham) was incorporated into cDNA 
durrng reverse transcription (Superscript II; Life Technologies) and puri- 
fied by concentrating to less than 10 pi using Miaocon.30 microconcen- 
trators (Amicon. Houston. Texas). Paired cDNAs were resuspended In 
20-26 Ml hybridization solution (3 x SSC. 0.75 ng/ml polyA DNA, 0.2% 
SDS) and applied to the microarray under a 22- x SO-mm coverslip for 6 
h at 63 'C, all according to a published method*. 

Fabrication and scanning of microarrays. PCR products containing 
common 5' and 3' sequences (Research Genetics. Huntsville. Alabama) 
were used as templates with amino-modified forward primer and unmod- 
ifted reverse primers to PCR amplify 6.065 ORFs from the 5. cerevisiaa 
genome. Our first-pass success rate was 94%. Amplification reactions that 
gave products of unexpected sizes were excluded from subsequent analy- 
sis. ORFs that could not be amplified from purchased templates were am- 
plified from genomic DNA. DNA samples from IOO-mI reactions were 
isopropanol-precrpiiated. resuspended in water, brought to a final con- 
centration of 3x SSC in a total volume of 15 mI, and transferred to 384- 
well microliter plates (Genetix Limited, Christchurch. Dorset. England) 
PCR products were spotted onto 1 x S-inch polylysine-treated glass slides 
by a robot built essentially according to defined specifications'*-' 
(http://cmgm.stanford.edu/pbrown/MGuide). After being printed, slides 
were processed according to published protocols'. 

Microarrays were imaged on a prototype mulil-frame CCD camera in 
development at Applied Precision (Issaquah. Washington). Each CCD 
image frame was approximately 2.mm square. Exposure times of 2 $ in 
the CyS channel (white light through Chroma 618-648 nm excitation fil- 
ter. Chroma 657-727 nm emission filter) and 1 s in the Cy3 channel 
(Chroma 535-560 nm excitation filter. Chroma 570-620 nm emission fil- 
ler) were done consecutively in each frame before moving to the next, 
spatially contiguous frame. Color isolation between the Cy3 and Cy5 
channels was about 100:1 or better. Frames were 'knitted' together in 
software to make the complete images. The intensity of spots (about 100 
^m) were quantified from the lO-Mm pixels by frame-by-frame back- 
ground subtraction and intensity averaging in each channel. Dynamic 
range of the resulting spot intensities was typically a ratio of 1.000 be- 
tween the brightest spots and the background-subtracied additive error 
level. Normalization between the channels was accomplished by normal- 
izing each channel to the mean intensities of all genes. This procedure is 
nearly equivalent to normalization between channels using the intensity 
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ratio of genomic ONA spou', but is possibly more robust, as it Is based on 
the intensities of several thousand spots distributed over the array. 

Signature correlation cbefricienu and their confidence limiu. 
Correialion coefncients between the signature ORFs of various experi- 
ments were calculated using: 

P-lJuyi/ax.'Iy.')" 
It k k 

where x. is the k)g» of the expression raUo for the gene in the x signa- 
ture, and y» is the log« of the expression ratio for the k"* gene in the y sig- 
nature. The summation is over those genes that were either up- or 
down-regulated in either experiment at the 95% confidence level. These 
genes each had a less than 5% chance of being actually unregulated (hav- 
ing expression ratios departing from unity due to measurement errors 
alone). This confidence level was assigned based on an error model which 
assigns a lognormal probability distribution to each gene s expression 
ratio with characteristic width based on the observed scatter in its re- 
peated measurements (repeated arrays at the same nominal experimental 
conditions) and on the individual array hybridization quality. This latter 
dependence was derived from control experiments in which both Cy3 
and Cy5 samples were derived from the same RNA sample. For large 
numbers of repealed measurements the error reduces to the observed 
scatter. For a single measurement the error is based on the array quality 
and the spot intensity. 

Random measurement errors in the x and y signatures tend to bias the 
correlation towards zero. In most experiments, most genes are not signif- 
icantly affected but do show small random measurement errors. Selecting 
only the '95% confidence' genes for the correlation calculation, rather 
than the entire genome, reduces this bias and makes the actual biological 
correlations nr>ore apparent. 

Correlations between a profile and itself are unity by definition. Error 
limits on the correlation are 95% confidence limiu based on the Individ- 
ual measurement error bars, and assuming uncorrelated errors". They do 
not include the bias mentioned above; thus, a depaaure of p from unity 
does not necessarily mean that the underlying biological correialion is im- 
perfect. However, a correialion of 0.7 ± 0.1. for example, is very signifi. 
cantly different from zero. Small (magnitude of p < 0.2) but formally 
significant correlation in the tables and text probably are due to small sys- 
temaiic biases in the Cy5/Cy3 ratios that violate the assumption of inde- 
pendent measurement errors used to generate the 95% confidence 
limits. Therefore, these small correlation values should be treated as not 
significant. A likely source of uncorrected systematic bias is the partially 
corrected scanner detector nonlineariiy that differently affects the Cy3 
and Cy5 detection channels. 

The 1 pg/ml FK506 Ueatment signature was compared with more 
than 40 unrelated deletion mutant strain or drug signatures. These con- 
trol profiles had correlation coefficients with the FK506 profile that were 
distributed around zero (mean p « -0.03) with a standard deviation of 
0.16 (data not shown), and none had correlations greater than p « 0.38. 
Similarly, the calcineurin mutant strain signature correlated well with the 
CsA treatment signature (p . 0.71 ± 0.04) but not with the signatures 
from the negative controls (mean p . ^.02 with a standard deviation of 
0.18). 



smaller spots have fewer image pixels in the average. This does not de- 
grade accuracy noticeably until the number of pixeb falls below ten m 
which case the spot is rejected from the data set. -Wander ' of spot post- 
tions with respect to the nominal grid is adaptively uacked in array jub- 
regions by the image processing software. Unequal spot "wander* within 
a subregion greater than half-a-spot spacing is a difTicutty for the auto- 
mated quantiiating algorithms; in this case, the spot is rejected from 
analysis based on human inspection of the 'wander'. Any spots partially 
overlapping are excluded from the data set. Less than 1% of spots tyoi. 
cally are rejected for these reasons. 
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Quality controls. End-to-end checks on expression ratio measurement 
accuracy were provided by analyzing the variance in repealed hybridiza- 
lions using the same mRNA labeled with both Cy3 and CyS. and also 
using Cy3 and Cy5 mRNA samples isolated from independent cultures of 
the same nominal strain and conditions. Biases undetected with this pro- 
cedure. such as gene-specific biases presumably due to differential incor- 
poration of Cy3- and CyS-dUTP into cDNA. were minimized by doing 
hybridizations in fluor-reversed pairs, in which the Cy3/Cy5 labeling of 
the biological conditions was reversed in one experiment with respect to 
the other. The expression ratio for each gene is then the ratio of ratios be- 
tween the two experimenu in the pair. Other biases are removed by algo- 
rithmic numerical de-trending. The magnitude of these biases in the 
absence of de-irending and fiuor reversal is typically about 30% in the 
ratio, but may be as high as twofold for some ORFs. 
Expression ratios are based on mean intensities over each spot. Some 
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cones nor q^indcrs formed at coTKentratioru of 0.5 
M Naa or below. Absorption spectra demonstrated 
that our CA-NC preparatioru were not contaminated 
with £sc/>erjcMi eoti RNA (estimated lower detection 
limit was -1 base/protein moteaile). To control for 
even tower levels of RNA contamination, we prein- 
oibated the CA-NC protein with 0.5 mg/ml ribonu- 
dease A (Type l-AS, 54 Kuntn U/mg. Sigma) for 1 
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The Transcriptional Program in 
the Response of Human 
Fibroblasts to Serum 

VtshWanath R. Iyer, Michael B. Eisen, Douglas T, Ross, 
Greg Schuler. Troy Moore, Jeffrey C. F. Lee, Jeffrey M. Trent, 

Louis M. Staudt, James Hudson Jr., Mark S. Boguskt, 
Deval Lashkari, Dari Shalon, David Botstein, Patrick Brown* 

The temporal program of gene expression during a model physiological re 
sponse of human cells, the response of fibroblasts to serum, was explored with 
a complementary DNA microarray representing about 8600 different human 
genes. Genes could be clustered into groups on the basis of their temporal 
pattems of expression in this program. Many features of the transcriptional 
program appeared to be related to the physiology of wound repair, suggesting 
that fibroblasts play a larger and richer role in this complex multicellular 
response than had previously been appreciated. 



The response of mammalian fibroblasts to 
serum has been used as a model for studying 
growth control and cell cycle progression (/). 
Normal human fibroblasts require growth 
factors for proliferation in culture; these 
growth factors are usually provided by fetal 
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bovine serum (FBS). in the absence of 
growth factors, fibroblasts enter a nondivid- 
ing state, termed G^, characterized by low 



metabolic activity. Addition of FBS or puri- 
fied growth factors induces proliferation of 
the fibroblasts; the changes in gene expres- 
sion that accompany this proliferative re- 
sponse have been the subject of many studies, 
and the responses of dozens of genes to sc- 
rum have been characterized. 

We took a fresh look at the response of 
human fibroblasts to scrum, using cDNA mi- 
croanays representing about 8600 distinct hu- 
man genes to obsen^e the temporal program of 
transcription that underlies this response. Pri- 
mai>' cultured fibroblasts from human neonatal 
foreskin were induced to enter a quiescent state 
by serum deprivation for 48 hours and then 
stimulated by addition of medium containing 
10% FBS (2). DNA miCToafray hybridization 
was used to measure the temporal changes in 
mRNA levels of 8613 human genes (i) at 12 
times^ ranging from 15 min to 24 hours after 
serum stimulation. The cDNA made from pu- 
rified mRNA from each sample was labeled 
with the fluorescent dye Cy5 and mbted with a 
common reference probe consisting of cDNA 
made from purified mRNA from the quiescent 



Fig. 1. The same seaion of 
the miaoarray is shown 
for three independent hy- 
bridizations comparing RNA 
isolated at the 8-hour time 
point after serum treat- 
ment to RNA from serum- 
deprived cells. Each mi- 
aoan^ay contained 9996 
elements, including 9804 
hunrwn cDNAs. represent- 
ing 8613 different genes. 
mRNA from serum-de- 
prived cells was used to 
prepare cDNA labeled with 
Cy3-deoxyuridine Uiphosphate (dUTP), and mRNA harvested from cells at different times after serum 
stimulation was used to prepare cDNA labeled with Cy5-dUTP. The two cDNA probes were mixed and 
simultaneously hybnd.ied to the miaoarray. The image of the subsequent scan shows genes whose 
« o^^n^l' more abundant |n the serum-deprived fibroblasts (that is. suppressed by serum treatment) 
as green spots and genes whose mRNAs are more abundant in the serum-treated fibroblasts as red 
spou YeUow spots represent genes whose expression does not vary substantially between the two 
r!iTt^ /-^T^,'??^ o^^^ representing the following genes: 1. protein disulfide isomerase- 

related protein PS; 2. IL-8 precursor: 3. EST AA057170; and 4. vascular endothelial growth factor 
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culture (dme zero) labeled with a second fluo- 
rescent dye, Cy3 (4), The color images of the 
hybridization results (Fig. 1) were made by 
r epre s e n ting the Cy3 fluorescent image as 
green and the Cy5 fluorescent image as red and 
merging the two color images. 

Diverse temporal profiles of gene expres- 
sion could be seen among the 8613 genes sur- 



F)g. 2. Ouster image 
showing the different 
dasses of gene e)9res- 
sion profiles. Fwe hun- 
dred seventeen genes 
whose mRNA levels 
changed in response to 
serum stvnulation were 
selected (7). This sub- 
set of genes was clus- 
tered hierarchically into 
grouF» on the basis of 
the similarity of their 
expression profiles by 
the procedure of Eisen 
€t ai (6). The expres- 
sion pattern of each 
gene in this set is dis- 
played here as a hori- 
lonial strip. For each 
gene, the ratio of 
mRNA levels in fibro- 
blasts at the indicat- 
ed time after serum 
stimulation ("unsync* 
dertotes exponentially 
growir>g cells) to its 
level in the scnjm-de- 
prived (time zero) fi- 
broblasts is represented 
by a color, according to 
the color scale at the 
bottom. The graphs 
show the average ex- 
pression profiles for the 
genes in the corre- 
sponding "duster" (in- 
dicated by the letters A 
to j and color coding). 
In evefy case examined, 
when a gene was rep- 
resemed by more than 
one anay element the 
multiple representa- 
tions in this set were 
seen to have identical 
or very similar expres- 
sion profiles, and the 
profiles corresponding 
to these independent 
measurements dus- 
tered either adjacent 
or very dose to ead) 
other, pointing to the 
robustness of the clus- 
tering algorithm in 
grouping gcries with 
very similar pattems of 
expressioa 



REPORTS 

\Tycd in this experiment (Fig. 2); many of tfiesc 
genes (about half) were uimamed expressed 
sequence tags (ESTs) (5). Although diverse 
patterns of expression were observed, the order- 
ly choreography of the expression program be- 
came apparent u-hcn the results were anat>'zed 
by a clustering and display method dc>'elopcd 
in our laboratory for anal>'zing genome-wide 



Cluster A (100 genes) 



Cluster B (142 cenes) 




Fold repression 

>6 >4 >2 



Otv 1 hf 6hr leivUnsync 
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gene expression data (6). An example of such 
an analysis, here applied to a subset of 517 
genes whose exprcsskm changed substantially 
in response to scrum (7), is shown in Fig. 2. 
The entire detailed data set underlying Fig. 
2 is available as a tab-delimited table (in 
cluster order) at the Science Web site (www. 
scicncemag.org,'fcaturc/data/984559.shl). In 
addition, the entire, larger data set for the 
complete set of genes analyzed in this exper- 
iment can be found at a Web site maintained 
by our laboratory (genome-www.stanford. 
cdu/serum) (*). 

One measure of the reliability of the 
changes we observed is inherent in the ex- 
pression profiles of the genes. For most genes 
whose expression levels changed, we could 
see a gradual change over a few time points, 
which thus cflfcctively provided independent 
measurements for almost all of the observa- 
tions. An additional check was provided by 
the inclusion of duplicate and, in a few cases, 
multiple array elements representing the 
same gene for about 5% of the genes included 
in this microarray. In addition, three indepen- 
dent hybridizations to different microarrays 
with mRNA samples ftx)m cells harvested 8 
hours after serum addition showed good cor- 
relation (Fig. I). As an independent test, we 
measured the expression levels of several 
genes using the TaqMan 5' nuclease fluori- 
gcnic quantitative polymerase chain rcaaion 
(PCR) assay (P). The expression profiles of 
the genes, as measured by these two indepen- 
dent methods, were very similar (Fig. 3){J0). : 

The n-anscriptional response of fibroblasts 
to serum was extremely rapid. The immediate 
response to scmm stimulation was dominated 
by genes that encode transcription factors 
and other proteins involved in signal trans- 
duction. The mRNAs for several genes [in- 
cluding c-FOS, JUN B, and mitogen-acti- 
vatcd protein (MAP) kinase phosphatase- 1 
(MKPI)] were detectably induced within 
15 min after scrum stimulation (Fig. 4, A 
and B). Fifteen of the genes that were 
observed to be induced by serum encode 
known or suspected regulators of transcrip- 
tion (Fig. 4B). All but one were immediate- 
early genes — their induction was not inhib- 
ited by cycloheximidc (//). This class of 
genes could be distinguished into those 
whose induction was transient (Fig. 2, clus- 
ter E) and those whose mRNA levels re- 
mained induced for much longer (Fig. 2, 
clusters I and J). Some features of the 
immediate response appeared to be directed 
at adaptation to the initiating signals. We 
observed a marked induction of mRNA 
encoding MKPI, a dual-specificity phos- 
phatase that modulates the activity of the 
ERKI and ERK2 MAP kinases (/2). The 
coincidence of the peak of expression of 
genes in cluster E (Fig. 2) with that of 
MKPI (Fig. 4A) suggests the possibility 
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that continued activity f the MAP kinase path- 
way is required to maintain induction of these 
genes but not of those with sustained cj^iression 
(chisten I and J). The gene encoding a second 
member f the dual-specificity MAP kinase 
phosphatase femily. known as duaJ-spcdficity 
protein phosphatase 6/^jyst2, was induced later, 
at about 4 hours after serum stimulatiwi. Genes 
encoding diverse other proteins with roles in 
signal transduction, ranging from cell-surface 
receptors (for example, the sphingosine 1- 
phosphatc receptor (EDG-1), the vascular en- 
dothelial growdi fiactor receptor, and the type II 
BMP receptor] to regulators of G-proiein sig- 
naling (for example, KETl/pllS rho GEF) to 
DNA-binding transcription factors, were in- 
duced by serum (Fig. 4A). 

The reprogramming of the regulatory cir- 
cuits in response to serum involved not only 
indunion of transcription factors but also re- 
duced expression of many transcriptional reg- 
ulators—some of which may play roles in 
maintaining the cells in Go or in priming 
them to react to wounding (Fig. 4C). Perhaps 
as a consequence of the historical focus on 
genes induced by scrum stimulation of fibro- 
blasts, the set of transcription factors whose 
expression diminished upon serum stimula- 
tion has been less well characterized. 

Genes known or likely to be involved in 
controlling and mediating the proliferative re- 
sponse showed distinctive patterns of regula- 
tion. Several genes whose products inhibit pro- 
gression of the cell-division cycle, such as p27 
Kipl, p57 Kip2, and pI8, were expressed in the 
quiescent fibroblasts and down-regulated be- 
fore the onset of cell division. The nadir in the 
mRNA levels for these genes occurred between 
6 and 12 hours after serum stimulation (Fig. 
5A), coincident with the passage of the fibro- 
blasts through G,. The levels of the transcript 
encoding the WEE Mike protein kinase, which 
is believed to inhibit mitosis by phosphoryl- 
ation of Cdc2, diminished bcnveen 4 and 8 to 
12 hours after scrum addition (Fig, 5 A), well 
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before the onset ofM phase at around 16hours, 
raising the possibility of an additional role for 
Weel in an eariier stage of the cell cycle or in 
regulating the Go to G, transition. Several 
genes induced in the first few hours after scnim 
stimulation, such as the helix-loop-helix pro- 
teins ID2 and ID3 and EST AA0I6305, a gene 
with homology to G,-S cyclins. arc candidates 
for roles in promoting the exit from G©. 

Genes involved in mediating progression 
through the cell cycle were characterized by a 
distinaive panem of expression (Fig. 2, clus- 
ter D), reflecting the coincidence of their 
expression with the reentry of the stimulated 
fibroblasts into the cell-division cycle. The 
stimulated fibroblasts replicated their DNA 
about 16 hours after serum treatment. This 
timing was reflected by the induction of 
mRNA encoding both subuniis of ribonucle- 
otide reductase and PCNA, the processivity 
factor for DNA polymerase epsiion and delta. 
Cyclin A, Cyclin Bl, Cdc2, and CDC28 ki- 
nase, regulators of passage through the S 
phase and the transition from Gj to M phase, 
were induced at about 16 to 20 hours after 
serum addition. The kinase in the Cyclin 
Bl-CDK pair needs to be activated by phos- 
phorylation. The gene encoding Cyclin-de- 
pcndent kinase 7 (CDK7: a homolog of JTe- 
rtopus MO 15 cdk-activating kinase) was in- 
duced in parallel with the Cdc2 and Cdc28 
kinases (Fig. 5A), suggesting a potential role 
for CDK7 in mediating M phase. DNA lopo- 
isomcrase II a, required for chromosome scg- 
regation at mitosis; Mad2, a component of 
the spindle checkpoint that prevents complc- 
non of miiosis (anaphase) if chromosomes 
are not attached to the spindle; and the kinei- 
ochorc protein CENP-F all showed a similar 
expression profile. 

In the hours after the scrum stimulus, one of 
the most soiking features of the unfolding tran- 
scriptional program ^^'as the appearance of nu- 
merous genes with known roles in processes 
relevant to the physiology of wound healing. 
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These tnchided bcch genes involved in the di- 
rca role played by fibroblasts in lemoddiM 
the clot and the cxoacclhilar matrix and, moie 
notably, genes encoding proteins involved in 
intercellular signaling (Fig. 5). Genes induced 
m this program encode procbcts that can fi) 
participate in die dynamic process of clotting, 
clot dissoluticm, and remodeling and pcriiaps 
contribute to hcmostasis by promoting local 
v^asoconstriction (ftir cxan^lc, cndothelin-l); 
(ii) promote chemotaxis and activation of neu^ 
trophils (for example, C0X2) and recruitment 
and extravasation of monocytes and macro- 
phages (for example, MCPI); (iii) promote 
chemotaxis and activation of T lymphocytes 
(for example, interleukin-8 (IL-8)] and B 
lymphocytes (for example, ICAM-1), thus 
providmg both innate and antigen-specific 
defenses against wound infeaion and recruit- 
mg the phagocytic cells that will be required 
to clear out the debris during remodeling of 
the wound; (iv) promote angiogcnesis and 
neovascularization (for example, VEGF) 
through newly forming tissue; (v) promote 
migration and proliferation of fibroblasts (for 
example. CTGF) and their differentiation into 
myofibroblasts (for example, Vimcntin); and 
(VI) promote migration and proliferation of 
keratinocytes, leading to reepithelialization 
of the wound (for example, FGF7), and pro- 
mote proliferation of melanocytes, perhaps 
contributing to wound hyperpigmentaiion 
(for example, FGF2). 

Coordinated regulation of groups of genes 
whose products act at different steps in a 
common process was a recurring theme. For 
example, Furin, a prohormone-processing 
protease required for one of the processing 
steps in the generation of active cndothelin, 
was induced in parallel with induction of the 
gene encoding the precursor of endothclin-l 
(Fig. 5E) (/i). Conversely, expression of 
CALLA/CDIO. a membrane meulloprotcase 
that degrades endothelin-l and other peptide 
mediators of acute inflammation, was re- 
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Fig 3. Indepervlent verification of miaoarray quamitatlon. Relative mRNA 
levels of the indicated genes (Mast, mast/stem cell growth fartor receptor) 

T^^^-"^ ^ ^' """^^^ fluorigenic quantitath^e PCR 

assay (9) (left) m the same samples that were used to prepare probes for 
microarray hybridizations (right). Data from the TaqMan arwlysis were 
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rK)rma!i2ed to mRNA concentrations and plotted relative to the level at 
Jnr? ^""'^ ^ those frlTtiS 



www.sciencemag.org SCIENCE VOL 283 1 JANUARY 1999 



85 



duced. A second example is provided by a set 
of five genes involved in the biosynthesis of 
cholesterol (Fig. 51). The mRNAs encoding 
each fthese enzymes showed sharply dimin- 
ished expression beginning 4 to 6 hours after 
serum stimulation of fibroblasts. A likely ex- 
planation for the coordinated down-iegula- 
tion of the cholesterol biosynthetic pathway 
is that senim provides cholesterol to fibro- 
blasts through low-density lipoproteins, 
whereas in the absence of the cholesterol 
provided by serum, endogenous cholesterol 
biosynthesis in fibroblasts is required. 

Many of the previously studied genes that 
we observed to be regulated in this program 
have no recognized role in any aspect of wound 
healing or fibroblast proliferatioa Their identi- 
fication in this study may therefore point to 
previously unknown aspects of these processes. 
A few seleaed genes in this group are shown in 
Fig. 5H. The stanniocalcin gene, for example 
(Fig. 5H), encodes a secreted protein without a 
clearly identified fumrtion in human cells (]4, 
15). Its induction in senmi-siimulatcd fibro- 
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Fig. 4. "Reprogramming* of fibroblasts. Expres- 
sion profiles of genes whose funrtion is likely to 
play a role in the reprogramming phase of the 
response are shown with the same representa- 
tion as in Fig. 2. In the cases in which a gene 
was represented by more than one element In 
the microarray, all measurements arc shown. 
The genes were grouped into categories on the 
basis of our knowledge of their most likely role. 
Some genes with pleiotropic roles were includ- 
ed in more than one category. 



REPORTS 

blasts suggests the possibility that it may play a 
role in the wound-healing process, perhaps 
serving as a signal in mediating inflammation 
or angiogenesis. 

One of the most important results of this 
exploration was the discovery of over 200 pre- 
viously unknown genes whose expression was 
regulated in specific temporal pancms during 
the response of fibroblasts to scrum. For exam- 
ple, 13 of the 40 genes in chistcr D (Fig. 2) have 
descriptive names that reflect their putative 
function. Nine of these 13 genes (69%) encode 
proteins that play roles in cell cycle progres- 
sion, panicularly in DNA replication and the 
G^-M transition. This enrichment for cell 
cycle-related genes suggests that some of the 
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unnamed genes in this cluster— for cxanmle. 
EST W793n and EST RI3l4d. neither of 
v^liich have sequence similarity to pirviously 
characterized genes-may represent previously 
unJcnown genes mvolved in Ais part of Ae cell 
cycle. Similarl>', a remartabic fraction of genes 
that were groi^jed into chister F on the basis of 
their expression profiles encoded proteins in- 
volved in intercelhilar signaling (Fig. 2X sug- 
gesting that a similar role should be considered 
for the many unnamed genes m this cluster. A 
di^jToportionately large fraction of the genes 
^■hose transcription diminished upon scmm 
stimulation were unnamed ESTs. 

Our intention was to use this experiment as 
a model to sttidy the control of the transition 
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Fig. 5. The transcriptional response to serum suggesU a multifaceted role for fibroblasts in th* 
physiology of wound healing. The features of the tTansaiptional program of fibroblasts n^s^^^ 
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from Go to a jroliferating state. However, one 

of the dcfming diaraoeristics of gcnomc-icale 
expression profiling experiments is that the ex- 
amination f so many divcise genes (^)cns a 
window on all the processes that actuafly occur 
and not merely the single process one intended 
to observe. Senmv the soluble fraction of clot- 
ted blood, is nomially encountered by cells in 
VIVO in the context of a wound. Indeed, the 
expression program that we observed in re- 
sponse to serum suggests that fibroblasts arc 
programmed to interpret the abnipt exposure to 
scrum not as a general mitogenic stimulus but 
as a specific physiological signal, signifying a 
wound. TTjc prdiferative response that we orig- 
inally intended to study appeared to be pan of a 
larger physiological regxmsc of fibroblasts to a 
wound. Other features of the transcriptional 
respoiisc to serum suggest that the fibroblast is 
an active panicipani in a conversation among 
the diverse cells that woric together in wound 
repair, interpreting, amplifying, modifying, and 
broadcasting signals controlling inflammation, 
angiogenesis, and epithelial regrowih during 
the response to an injury. 

We recognize that these in vitro results 
almost certainly represent a distorted and in- 
complete rendering of the nortnal physiolog- 
ical response of a fibroblast to a wound. 
Moreover, only the responses elicited directly 
by exposure of fibroblasts to senim were 
examined. The subsequent signals fix)m other 
cellular participants in the nomial wound- 
healing process would certainly provoke fur- 
ther evolution of the transcriptional program 
in fibroblasts at the site of a wound, which 
this experiment cannot reveal. Nevertheless 
we believe that the picture that emerged 
strongly suggests a much larger and richer 
role for the fibroblast in the orchestration of 
this important physiological process than had 
previously been suspected. 



Reports 



RcfererKes and Notes 

1. J. A. WinUcs, Prog. Nuciek AcktRa. Mot. Bioi 5B 41 
(1998). 

2. Anormd humandipioWribrobta^ 

(ATCC CRL 2091) in p«sagt 8 w.s used in 
theie expcnmentL The protocol followed for growth 
essentiatty that of (76) and 
(J7). Celb were grown to about 60% confluence in 
IS-on petri dishes in OuUwcco's minimum euemial 
medium conuinir^ glucose (1 gAiter). the anta»iotia 
PwMbn and fUeptomycin. and 10% (by vol) FBS (Hy. 
done) that had been previously heat inactKrated at 
56»C for 30 mh. The cetts were then %vashed three 
tin»es with the same medium ladcir^ FBS. and tow- 
swum medium {ai% FBS) was added to the plates. 
After a 4&.hour incubatiorv the median was replaced 
wrih fresh medium comatnir^ 10% FBS. mRNA was 
»»l«ed from several pUtes of ceils harvested before 
«*nim stvnuUtiorv thb mRNA served as the serum- 
staged or time-zero reference sample. Cells were har- 
vested from batches of plates at 11 subsequent inter- 
wrts (15 mirv 30 mirv 1. 2. 4. 6. a U 16. 20. and 24 
hourj) after the addition of serum mRNA was also 
isolated from cxponentlaUy growing fibroblasts (not 
subjeaed to serum starvation). mRNA was isolated 
wi* the FastTrack mRNA isolation kit (Invitrogen) 
wt»ch involves lysb of the cells on the plate. The growth 
'nedium was removed, and the celts were quickly 



waAed with phosphate-buffered s^ at lonm 
per«ur.The^bufferwasadd.!^^ 
'^red to lubev *id frozen in liquid nitroSTsSt 

SS^TpZ^r '^'-''^ ~- 

3. The National Center for Biotechnology Information 

ZT^^ "''^ <*'t-baseTrSUT^r 
W««ng human sequences comained in Cenaa^ 
casters representing disiina tranjoipts or per«s (IB 

^''a^^o^ ^^^^^ 

U«ned^4a000such dusterv We seteaed a subset 

SLlZTTtJ!!^^ UniCene dusten were 
1^ '!:^^'^ « °« ^ from 
tf* LKACE. human cONA colieaion (20). so that a 

^»e available commercially from a of 
vwKters). We attempted to include as complete as 
P«»W. a set of the -n*ned- human 
^and genes that appeared to be dosSy^ttd^ 

2^CL1**'^ an additior^^ 

^ rema^mg 4000 dones were dwsen from 

the ano^mKKis- UniCene dustea on the basis 
of mduswn on the human transcript map fwww^ 
"^-^/SC1£NCE96/) ^ t^ckTj^n^ 
-o*^to other genes in t^ seleaed^ 
^ «P«nting ead, of the seleaed genes wL 

«*»^ «"» more rtcem -IStt set- described at ^ 

2^«don^472 are absent from the current edition of 
^^™Cen^ were presumed to be distina germ. The 

n^sem 8613 different genes, were used to print 
m-^cr^ accordir, to methods described previ<S^ 

«ntiy labeled cDNA probes for hybridizing to tfTrS- 
^„ . ''l^'*' senim-starved ceUs w« 

^c^ZZ^ ^""^ ^' wrum-starved celts served as 
m^KiA^? P'"** ^ hybridtiations 

f^^^'* ^''^ irrXiatelyr: 
^^^^'"^ '"'"'^ •'^^ serumlLu- 

ISa^^? in ^ ^y^' ^ograms of 

rf^^ r^f °' Potydeoxyadenylic add and 20 
« of human CoTI DNA (Cibco-BRL) vJere added to the 
^^l^"^"* « a soJion contai^^^a": 

SOS and auLed 

ISe^^SJ^"" !L ^'"^P'"^"^ tor 30 min before 
^tndwt^m. washes, and fluorescent scans were per- 
te^d « descnbed previously (23. 24). AU measuTe- 
n^toutirtg more than 180.000 differential expres- 

b« 'or analysa and interpretation. 

abou2750) on the microarray were verif^d by se- 

o^^Z^^^ «™iat,on. as well as a large number 

^ JT"**^ ***** ^"8* substantially 

coi^e of this experiment. About 85% of the 

w^d^I?K^* °' '"'^"'Tay that 

were AeAed by resequendng were correaly identified. 

reconfirrned by resequendng. In the cases 
wh^ a human gene has more than one name in the 
^t^* ^ '° « "»st 

raiy identificaton number (fomwt: S(D####««) anda 
c^lTJ^T'^ P*"^'"* ^^'^^ verification. The 
wT^!?7"**' P°"«* « our 

ZTrJ^r^^^JT^"^ as they 

are confim^d by resequendng. ^ 

^oc. Naff. Acad. 5c/. U.5.A. 95. 14863 (1998) 
7. Genes were seleaed for thb analysis if either (i) their 

SSiri'r* "r'*** ^^^^ 9"«scent f ^. 
Wasts ty « least a factor of 2-20 in « least two of the 



»*;^fr«^««^^ 

fv the set d 13 w>i.^ -# # 
N DccniM of noiH h Ok dna. --w 

»^'or m..t/.t«„ c«U growth fetor rtc-tj?^ 
•onwwtat low in th. micro«r» d»u ThijS^^T 
«^ » probably . cons^^^u^^^Z 
b»ekpound subUKtioo mtthod m«d iL^^^^ 

•«« uMd «r, « follow COX2r(-rTr/-?Tii^ 

V. R. lyw a */.. unpubliihed data. The «n. «»< 
.»n data for th. .arty tin» poi«, iX^^S^ 
cy^ximid. win b, avai«, at 0^,^^"^^ 
(g7»m.^.ttartford«)u/,«„m) 

12. T. Humw. Cell 80. 225 (1995) 

13. J-^ l|PP.l»ou, and H. RuAoaho. Am,. »«. ,4. 153 

ri4 (^5^^ ^ • ^'*»* fVXmoJ. 2M. 

3«'i,9?r °- ''-^ 

19. C. IS. Schutar.y. moI. M«f. 75. 694 (19971 

20. C. l«„,oa c. Aoffray. K PolynJopouL I- B. 
Soaro. c«,omfa 33. 151 (1996) 

21. I.M-A.C.E. don« w«, amplifirt ty PCR fei 96--n 

PCR products were suspended at a fmT-_7^_Z- j 
*;t^ onto co««l giao by ^U-^WSIS 

nft^tn*!:* «ray«) onto «,'a4. of lii^ ^ 
1.8 cm wrth th. rt«i»nu ipaod 175 lun aoartlS 
rn|aoa,T.y.w«, thenpo,,;^^^^;^ 

f^" •VtHdi«ion With a^^Si^ 

22. M Sthma. D. Shalon. R. W. D«nj. p o Brow. 
Scwnc. 270. 467 (1995) °- """^ 

(1997?'""' " °- *«• MO 

OH, tor h.lp with v«i(ic«initanadrto 
.dvK. on th, T«,Man .»ay. and IjSmv^^ 
m.mb«, e( th, P.OA «^ OB. lis fo d^S^T 
Supports by a gram from th. ^uT^UTH^T^ 

National C««., ,n„Hut, (NIH ui^J^l^^ 

tOf.l f dlow n Compotatio.1,1 rio(«uUr Bioto« «1 
O.TJ( „ a Wrti« and Idun e«,y fdtow PM i.^ 



13 August 1998. acptrt 13 Nowmb., 1998 



www.sciencemag.org SCIENCE VOL 283 1 JANUARY 1999 



87 



r EXHIBIT E 

^ ' Docket No.: PC-0044 CIP 

USSN: 09/895,686 

" * 2000 Nature America Inc. •http://genetlcs.nature.c n 



Systematic variation in gene expression 
patterns in human cancer cell lines 

Douglas T. Ross', Uwe SchcrfS. Michael B. Eisen^. Charles M. Perou^. Christian Rees^ Paul Spellman^ 
V,^hwanath Iyer' Stefanie S Jeffrey Matt Van de RijnS Mark Waltham^ Alexander Pergamenschiko;^. 

L^"^ u ;^ n ' 7, ' • "^'"^^^y ^' ^y^"'' N. Wdnstein^. David Botstein^ 

& Patrick O. Brown * *^ 

we used CDNA "'Icroarrays to explore the variation in expression of approximately 8,000 unique genes among the 
60 c II Imes used .n the National Cancer Institute's screen for anti-cancer drugs. Classification of the cell lines based 
solely on the observed patterns of gene expression revealed a correspondence to the ostensible origins of the 
turnours frorn which the cell lines were derived. The consistent relationship between the gene expression patterns 
and the tissue of origin allowed us to recognize outliers whose previous classHication appeared incorrect Specific 
E features of the gene expression patterns appeared to be related to physiological properties of the cell lines, such 
I as their doubling time in culture, drug metabolism or the Interferon response. Comparison of gene expression pat- 
I t rns in the cell lines to those observed in normal breast tissue or in breast tumour specimens revealed features of 
g the expression patterns in the tumours that had recognizable counterparts in specific cell lines, reflecting the 
I tumour, stromal and inflammatory components of the tumour tissue. These results provided a novel molecular 
I characterization of this important group of human cell lines and their relationships to tumours in vivo. 
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I bility of human tumours and normal tissue rtehbS^hit hJl^Z ''"'""P""^"^ report" explores the relationship 

bn« to more than 70,000 different ch«™cd^^^^^ indud- www.stanford.edu/nci60 and h.^^ LZ^^^^^^^ 

ing aU common chcmothcrapcutics (http://dtp.nd.nih.gov). A i«.over.ncj.nin.gov;. 

previous analysis of these data revealed a connection between the Results 

panern of activity of a drug and its method of action. In particular. We studied cene exDre«ion in tK. An ..ii r r^v,. 
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fig. 1 Gent nprcuion panernt related to the tnsue o< ., .- 

.ion.1 hierarchical tlurterinfl wa. applieS to eror^Tlr^. . ^ ''"^ TwoKlimen. 
m.a.u,ed atro« 64 cell lin«. The I J61 iNAs^H Z^-'A 'llZ' ""n"* ' 
levehtha, varied by at lea« .e«„fo,d (l<^ (r^i^^ji^^a ° ""^'"^ 
least 4 of 60 cell line». Thb effectively «l«ed ^•^it, * '•<««nce pool in a, 

.ion le»l aa«. the 60 cell line, OncLd^^'^^'T^^^^Pl S;""" """i"" "P'«- 
ence pooO. and therefore highlighted th^'tC^™' 

di«in9ui.h«l«»cellline.fromL.,X,^ur~ ^P^^^ that be,, 

for each cell line plu, the two »Mit,on^^^^J^ hybrKfuafon, «,e,e »ed, one 
«ne. .562 and ^n. The two IZZ:T"' «" 

weighted for the gene cheering » that each ofZ m «in ""«P»"*'<9'y 
clunering. The .elHine dendrograrrv w^t^ ,^ "„?/L'"^^^ ••"""^ '» '"e 

o.ten«ble tiuu. of origin of the «ll irnr«r,d i J ! branches coloured to reflect the 

P.^ prosute; llgM blueTung: o^a^" o7a 

black, .mknown (NCI/ADR-MS)). ^i' "o tl^ ^l "r^T''^*^- "-"noma; 
relation coeffici«« represented by t^ ter^h oVC 1^ «*end,ogram depicu the cor- 
pair, of nodes. No,, t4t the two ,,ip,e«^Trep^t^d1^Sr«,""^' """"'"9 
tightly together and were well differentia,^ f'l^^^^the m~.c,n J, 
indicating that this clustering of cell linn i< h...rf-^ '*'>' 
e«p,e.s::rn pattern, rXr than Vr^^et^ o^ thVe-H^^^^^^ 

representation of the data table, with the rom TJ2,"' *• * coloured 

order. The dendrogram representing f^'!:^«TJ^::^L::^^:'^J::"'^> """" 
ted for clarity, but b available (httpaS(genome.«~w ,t.nir^ll , ^, 
cell of this table reflects the ■'''-•^T^Ti^Z^T^^^T^^- 1** 
(column). The colour Kale used to reore^m * «" 

3.-3d in«,,,efer,othec.uste^;°;:^":::;^*;:^'.^,,;-r ■» • The ..be,. 




Each hybndaation compared CyS-labelled cDNA reverse tran- 

sample This reference sample, used in all hybridizations, was 
prepared by combining an equal nii«ure of mRNA from 12 of 
the cell Lnes (chosen to maximize diversity in gene expression as 
determined prmiarily from two-dimensional gel studies^) Bv 
companng cDNA from each cell line with a common reference 
variation in gene expression across the 60 cell lines could be 
interred from the observed variation in the normalized Cv5/Cv3 
ratios across the hybridizations. ■ 
To assess the contribution of artefactual sources of variation in 
Mr.^'^n r """f"'"^ expression patterns. K562 and 

MCF7 cell hnes were each grown in three independent cultures 
and the entire process was carried out independently on mRNA 
extracted bom each culture. TTie variance in the triplicate fluo- 
rescence rauo measurements approached a minimum when the 
fluorescence signal was greater than approximately 0.4% of the 
mMsurable total signal dynamic range above background in 
either channel of the hybridization. We selected the subset of 
spots for which significant signal was present in both the numer- 
ator and denominator of the ratios by this criterion to identify 
the best-measured spots. The pair-wise correlation coefficients 
for the triplicates of the set of genes that passed this quality con- 

!l 7' S,'? 'he MCF7 sampiran76"6" 

spo« for K562) ranged from 0.83 to 0.92 (for graphs and details 
see nnp://genome-wwwjtanford.edu/nci60) 

To make the orderly features in the data more apparent, we used 
a hierarchical clustenng algorithm'"" and a pseudo-colour visu 



wf ' "'^"rr °^ Senes and to 

genes whose expression level varied among the 60 cdl linJi, a 
similar manner. Qustering was performed twice using differen! 
subsets of genes to assess the robustness of the analysis.^ 
va L on"^ concentrated on those genes that showed the most 
variation m e;q,ress.on among the 60 cell lines (1.167 total). A seo 
ond analysis (Fig. 2) included all spots tha, were thought to be w^ 
measured m the reference set (6.831 spots). 
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The mo,, „o,3b|, p,„p„^ ^^^^^^^ ^^^^ ^^^^ ^.^^ 

with common presumptive tissues of origin grouped together 
central nervous system, colon, renal and ovarian tissue were dus- 
' o? 'P'"'''"' '5™'"^' ''""ches specific to their respec- 
tive organ types with few exceptions. Cell lines derived from 

.n a'^ '"^ tumours were distributed 

in rnult.ple different terminal branches suggesting that their gene 
expression patterns were more heterogeneous 

Many of these coherent cell line clusters were distinguished by 
he specific expression of charaaeristic groups of genes 
(Fig 3o-rf). For example, a cluster of approximately 90 gen« was 
highly expressed m the melanoma-derived lines (Fig. 3c). Th sse 
was enriched for genes with known roles in melanLyte biS 
mcluding tyrosinase and dopachrome tautomerase (TYR and 
u f"''units of an enzyme complex involved in melanin 
synthesis",. MARTI (MLANA; which is being investigated" a 
target lor immunotherapy of melanoma") and SlOO-6 (SJOOB- 
which has been used as an antigenic marker in the diagnosis of 
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Hg. 2 Gene expression pattemt related to 
other celMinc phenotypcs. o. We applied 
two^imemional hierarchical clustering to 
expression daU from a set of 6.83 1 cONAs 
measured aaoss the 64 cell tines. The 6,831 
cONAs were those with a minimum fluores- 
cence signal intensity of approximately 0.4% 
of the dynamic range above background in 
the reference channel in each of the six 
hybridizations used to establish reproducibil- 
ity. This effectiveiy selected those spou that 
provided the most reliable ratio measure- 
ments and therefore identified a subset of 
genes useful for exploring patterns comprised 
of those whose variation in expression across 
the 60 cell lines was of moderate magnitude. 
6, Cluster-ordered dau table. <; Doubling 
time of cell lines. Cell lines are given in cluster 
oriter. Values are plotted relative to the mean. 
Doubling times greater than the mean are 
shown in green, those with doubling time less 
than the mean are shown in red. d. Three 
related gene clusters that were enriched for 
genes whose expression level variation was 
correlated with cell line proliferation rate. 
Each of the three gene clusters (clustered 
solely on the basis of their cxpreuion pat- 
terns) showed enrichment for sets of genes 
involved in distina functional categories (for ' 
example, ribosomal genes versus genes 
involved in pre-RNA splicing), e. Gene cluster 
in which all characterized and sequerKe-veri- 
fied cDNAs encode gertes known to be regu- 
lated by interferoru. f. Gene cluster enriched 
for genes that have been implicated in drug 
metabolism (indicated by asterisks). A further 
property of the ger>e clustering evident here 
and in Fig. 2 is the strong tendency for redun- 
dant representations of the sarrte gene to 
cluster immediately adjacent to one another, 
even within larger groups of genes with very 
similar expression patterns. In addition to 
illustrating the reproducibility and consis- 
tency of the measurements, and providing 
independent confirmation of many of our 
measurements, thb property also demon- 
strates that these, and probably all. genes 
have rtearly unique patterru of variation 
across the 60 cell lines. If this were not the 
case, and multiple genes had identical pat- 
terns of variation, we would not expea to be 
able to distinguish, by clustering on the basis 
of expression variation, duplicate copies of 
individual ger>es from the other genes with 
identical expression patterns. 
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isohttdfromapaticntvnAmeb^^SSimS^^^^^^^^ -«Pond«,g d.ff«nccs in activity f thc« p^c^si i„ the^ 

cation f melanoma celk'. 7°** PJ*^""^ n«»saiy f i progress] n through the 
ftuadoxically. two related ccD lines (MDA-MB435 and MDA- Si^^'id t,"^?^'' ^^^^'^ '"'^ pro- 
NX which were derived from a single patient with brS 3 Skid ^^^onT^"*^ 

and have been conventionally regarded abreast cancer celE« n^tJ^W.V^ ^auon etengauon factois) and traditional 

shared expression of the genes ^ted with mefcZ^ SiXT^ '° "'"".^ proliferating cells (MK167). 

MB435 was isolated from a pleural effusion in a pa^.^A clusters enriched for genes 

metasutic ductal adenocarcinoma of the b.i^^«^ h TeL,^ Turner! 1?'^ "^"^ ^ for 

possible^ttheori,nof*eceOnnewasabreastcancer.= JST^re^d^J K.^S^ -^^^^ 



m gene expression pancm is related to the neuroendocrine fea- 
tures of some breast canccrs^^ But our results suggest that this cell 
line may have originated from a melanoma, raising the possibility 
that the patient had a co-existing occuh melanoma. 

The higher-level organization of the cell-line tree— in which 
groups span cell lines from different tissue types— also refleaed 
shared biological properties of the tissues from ^ich the cell 
lines were derived. The carcinoma-derived cell lines were divided 
into major branches that separated those that expressed genes 
characteristic of epithelial cells from those that expressed genes 
more typical of stromal ceDs. A duster of genes is shown (Fig 3^ 
that is most strongly expressed in cell lines derived from colon 
caronomas. six of seven ovarian-derived ceU lines and the two 
breast cancer lines positive for the oestrogen receptor. The named 
genes m this cluster have been implicated in several aspects of 
epithelial ccU biology^^. The cluster was enriched for genes whose 
products are known to localize to the basolateral membrane of 
epithcHal cells, including those encoding components of 
adherens complexes (for example, desmoplakin (DSP) 
periplakin (PPL) and plakoglobin (JUP)). an epithelial-' 
expressed ccll-cell adhesion molecule (M4S1) and a sodium/ 
hydrogen ion cxchanger^'^^^ (SLC9A1). It also contained genes 



expression of these ribosomal genes was significantly correlated 
wnh variation „, the cell doubling time (correUtion coefficient of 
0.54 . supporung the notion that the genes in this cluster were 
£ cei?iin« °" '° proliferation rate or growth rate in 

In a smaller gene duster (Fig. 2<0, all of the named genes were 
previously known to be regulated by interferons "J*. Additional 
groups of mterferon-regulated genes showed distinct patterns of 
expression (data not shown), suggesting that the NCI60 cell lines 
exhibited variation in aahrity of interferon-response pathways, 
which was refleaed in gene expression patterns^ 

Another duster (Fig. 2c) contained several genes encoding 
proteins with possible interrelated roles in drug metabolism, 
induding glutamate-cysteine ligase (GLCLC, the enzyme respon- 
Sn T-riM^"' ^ting step of glutathione syntheris). thiore- 
doxin CnCN) and thioredoxin reductase (TXNRDl; enzymes 
invohred m regulating redox state in cells), and MRPl (a dniR 
transporter knoN«, to efficiently transport glutathione-conju- 
gated compounds"). The elevated expression of this set of genes 
in a subset of these cell lines may reflea selection for resistance to 
cnemothcrapeutics. 



epithehal-expressed tumour suppressor (LLGLl) andThnmtr f-t*^*'"* clinical samples 

box gene thought to conuol caldum-nSated adherencT^n If"' ^"^ours of the breast typically 

epithelial celk'"' (MSX2). ? T''*''' ^'""'"g''*! organization, with conneai>Vtissue 

In contrast, a separate, major brand, of the ceU-linedendro- I'^lnr^thTH" -K r'""!* '""our ceUs. To 

gram (Fig. la) induded all glioblastoma-derivj ceU lines Tu ^"''"^7 TT"" 8*"* «P^'°n i» ^he 

renal-cell-caitinoma-^ierived cell hnes and the rVm Jnine carri IT ' P'"^"*' ' ^'""^rk for interpreting the 

noma-derived Unes. TT.e d,aracteristic se, of ge^Xled Sd fro^T K '"-""^.^P'"""-- compared RNA 

this duster induded many whose products arf invoSin si™ ^,1 ? breast cancer biopsy samples, a sample of nor- 

mal cell functions (Fig. 3 J. Indeed, the two «Z^or "gii" cTncerXrdiL MdJ m'^'S tVi ''^ 

describedas-sarcoma.like-inappearance(Hs578T.brea/.J ^^^1^^^^^^^^^^ 



described as 'sarcoma-like' in appearance (Hs578T. breast carci- 
nosarcoma, and SF539. gliosarcoma) expressed most of these 
genes^-35 Although no single gene was uniformly charaaeristic 
of this duster, each cell line showed a distinctive panern of 
expression of genes encoding proteins with roles in synthesis or 
modification of the cxtraceUular matrix (for example, caldesmon 
(CALDl). cathepsins. thrombospondin (THBS). lysyl oxidase 
(LOX) and coUagen subtypes). Although the ovarian and most 
non-small-cell-lung-Kierived carcinomas expressed genes charac 



teristic of both epithelial cells and stromal cells thev Drobab v uZ' V'^Tv . t^rnour cells with features similar to those of 
clustered with the'cNS and renal cell carcinomt in^K ' "r.! /^^^^^^^^^^ t'^' ^P^-." - of genes char- 



clustered with the CNS and renal cell carcinomas in this analysis 
because genes charaaeristically expressed in stromal ccUs were 
more abundantly represented in this gene set. 

Physiological variati n refl ct d 
in g ne express! n patterns 

A cluster diagram of 6.831 genes (Fig. 2) is useftjj for exploring 
dusters of genes whose variation in mRNA levds was not obvi- 
ously anributable to cell or tissue type. We identified some gene 
dusters that were enriched for genes involved in specific cellular 
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sion pattern shared between the cancer spedmens and individual 
ceU lines derived from breast cancers and leukaemias 
The genes encoding keratin 8 (KRT8) and keratin 19 (KRT19) 

TJATn'\?l'^' 8^"" defined in the com-' 

pJete NCi60 cell line duster, were expressed in both of the biopsy 
samples and the two breast-derived ceU lines. MCF-7 and T47D 
expressing the oestrogen receptor, suggesting that these tran- 
scripts originated in tumour cells with features similar to those of 



acteristic of stromal cells, induding collagen genes {COUAl 

??.nf^n ^^^^^^ '"^ ''"ooth muscle ceU markers 
[TAGLNl was a feature shared by the tumour sample and the 
stromal-like ceU lines Hs578T and BT549 (Fig. 5fc). This feature 
of the expression pattern seen in the tumour samples is likely to 
be due to the stromal component of the tumour. The tumours 
also shared expression of a set of genes (Fig. 5c) with the multiple 
myeloma cell line (RPMl-8226). notably induding 
immunoglobulm genes, consistent with the presence of B cells 
m the tumour (this was confirmed by staining with anti- 
nature genetics • volume 24 • marth 2000 
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melanoma cluster (16 ESTs) 
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mesenchymal cluster (67 ESTs) 



F*9. 3 Gene clusters related to tissue charaneristics in th# r*ii t c 
...derived Ou,,., of S^n^^X^Z ZT^^^ "- '"•''-•"""s <*'^^^^^^^^^ 

MBtn and MOA-N). c/ Cluster of aen» hiohi? °" '"•'•"oma-derived line, (6/7) and tvyo rJl^J^'r •< » 'ower level in all fenal<an. 

mode,„..y e.p„..;d in a\«b«, of « ^ZIT "r""'' •". 9'«b.a.,oma ,6/6, line, and mo^ .r^r-r^^l^"'^ 

cancer (MDA- 
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immunoglobulin antibodies; data not shown). Therefore dis 
una sets of genes with co-varying expression among the samples 
(Fig. 4, arrow) appear to represent distinct cell types that can be 
disunguished jn breast cancer tissue. A fourth duster of genes 
more highly expressed in all of the ceU lines than in any of the 
dinical specimens, was enriched for genes present in the 'prolif- 
eration duster described above (Fig. Sd). The variation in 
expression of these genes likely paralleled the difference in prolif- 
eration rate between the rapidly cycling cultured cell lines and the 
much more slowly dividing cells in tissues. 

Discussion 

Newly available genomics tools allowed us to explore variation in 
gene expression on a genomic scale in 60 cell lines derived from 
diverse tumour tissues. We used a simple duster analysis to iden- 
tify the prominent features in the gene expression patterns that 
appeared to reflect 'molecular signatures' of the tissue from 
which the cells originated. The histological characteristics of the 
ceU lines that dominated the dustering were pervasive enough 
that similar rdationships were revealed when alternative $ubs«s 
of genes were selected for analysis. Additional features of the 
expression pattern may be related to variation in physiological 
attributes such as proliferation rate and activity of interferon- 
response pathways. 

The properties of the tumour-derived cell lines in this study 
have presumably all been shaped by selection for resistance to 
host defences and chemotherapeutics and for rapid proliferation 
in the tissue culture environment of synthetic growth media, fetal 
bovme serum and a polystyrene substratum. But the primary 
Identifiable factor accounting for variation in gene expression 
patterns among these 60 cell lines was the identity of the tissue 
from whidi eadi cell line was ostensibly derived. For most of the 
cell lines we examined, neither physiological nor experimemal 
adaptauon for growth in culture was suffident to overwrite the 
gene «pr«sion programs established during differentiation in 
ym>. Nevenhdess. the prominence of mesenchymal features in 
the cell Lnes isolated from glioblastomas and carcinomas may 
reflea a sdeoion for the relative ease of establishment of ceL 
lines expressing stromal characteristics, perhaps combined with 
physiological adaptation to tissue culture conditions^*-***. 



fta- * Comparnon of the g«nt e].pnss«n p.tttnn in dinical brmt c»», 

cancer ve^in^ , ^ ^ f«^o^t^M^Z^ ^ 

^ .he NCI60 brean and .euk«mia^„i,«, ^.nZJ^g^^^^t^ 
data iron, .p«i„„„, „„ ^ 

.ub«t of th. NOSO cell line, to e>pk>.e whetti, f.«.^ortt™SoZlt 
terns ob.,n,ed in ^,ecific line, could be idemified in Se te« S^^SSi 

eel ula, cmiponent. of tl« tumour vetimem. b. BniM c«Ker wKinSTll 
n..ned wnh anti-keraiin antibodies rtK««ng tli comple^U SueH^ 
characterijtically found in breatt tumours. T»i arro«« Wohliel^ .51 
ceHula, component, of thi. tn^ ^^ i:^^^,^'"^^ 
9ene eapreuion cluster analysis (Fig. 5). uwinBusneo by the 



Biological themes linking genes with related expression pat- 
terns may be inferred in many cases from the shared anribut« of 
known genes within the dusters. Undiaracteriied dJNAs are 
litely to encode proteins that have roles similar to those of the 
known gene products with whid, they appear to be co-regulated. 
Sun. for several dusters of genes, we were unable to discei^a com- 
mon theme linking the identified members of the duster. Further 
exploration of their variation in expression under more diverse 
conditions and more comprehensive investigation of the physiol- 
ogy of the Na60 cells may provide insight'f The reUtioE of 

sured by the DTP is an example of linking variation in gene 
expression w,th more subUe and diverse phenotypic variauon". 

itie patterns of gene expression measured in the NCI60 ccU 
lines provide a framework that helps to distinguish the cells that 
express specific sets of genes in the histologicaUy complex breast 
cancer specimens*'. Although it is now feasible to analyse gene 
expression in micro-dissected tumour spedmens«-«. this obser- 

some of the biology of chnical tumour samples by sampling them 
S; h I" conventional morphologicid patholo|y. o« 

might be able to observe mteraaions between a tumour and its 
microenvironment in this way. These relationships will be dari- 
fied by suitable analysis of gene expression panerns from intaa as 
well as dissected tumours"-'< i5 «". "uacias 

Methods 

l^^^dT'.^' '''^^ Gene.- 

1], A " "P'""''"" » b,aerial colonies in 96-weU micro.iire 
P 3.e.'. Approximately 8.000 distinct Unigcne dus.en (representing nol" 
naUy unique genes) were represented in this sel of clones. All gene, identi- 

secuen !„ ' "'T T'""' ^' ""fi™"" b"*. 

«?Zh ^" ' "'1"" i"d'P««ient cDNA clone, 

paiterns. A s.ngle-pass 3" sequence re-verificalion was attempted fo, eve.^ 
clorje after re.$,re,king for single colonies. For a subset of genes for whiA 
^ua ny 3 sequence was no, obtained, we a.iempted to confiL iden,i.T« J 
5 sequencing Of the subset of clones selected for S" sequence verification 
on the basis of an interesting patlem of expression (888 lolal). 331 were cor- 
rec. ly identified 57^ incorrectly identified, and 500. inde.em,ina,e (poor 
quahtysequence). We estimated ,h„ 1594^20% of array elements coma^^ed 

3nL"^'""?"^'"°" °"' P" So f"- the identities of 
-3 000 clones have been verified. TTie fuU list of clones used and their nomi- 

r w!!"h',T P'^'ded by the designation -SID»- 

(Stanford identification) represent clones whose identities have not yet been 
verihed;http://genome-www.stanford.edu:8000/nci60). 
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Production of cDNA microarray,. The arrays used in this experiment were 
1 iT. r Pharmaceuticals). Each insert was 

amplified from a bacterial colony by sampling 1 nl of bacterial media and 

.he ?h,«"n'. '"^V""""'""''". """" """8 P"-"'" for 

rrrArrr V'Pj:"'"'''* (5 -TICTAAAACCACC 

CCCAGTG-3 . 5 -CACACAGGAAACAGCTATC-3 ). Each PGR product 
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3XSSC (10 mJ). n.e PGR producu were then printed on treated e1<u> 
nu£rt«ope duJe, o,mg a robot with four printing tip,. Deuiled proi^U 
for i»einhling ind opmtmg i mienwrey printer. u)d printing and exper- 

Zo'ixst^r ""^"-^ 

fttpenition of mlWA «id refeitnoe pool CeO line, weie grown from NQ 

2 mM) ««J 5% et.1 calf serum. To minimize the contribution of ^rirow 
« culture eondiUOM or ceB demity to diflerentiaJ gene expression 
ejdb ceD hne to 80% confluence „d i«,bted mlWA 24 h X 
h»h medium. TTje tmie between removal from the incubator and Ivsis of th. 

eeikinRm«.biIix,.ionbufferw.,minimi»d(<.minri^tS^^^^ 
buffer cmtauung guinidium isothiocyanate and total UNA was ourified 
wth the RNeaiy purificaUon ki, (Qiagen). We purified L S3 



wing a poly(A) purifiaUon lot (Oligotex. Oiacm) ^-i:.. . .u 
ui.egntyandreUuvecont«m«UonofmRNAwithn1»iomd»Sr^ 
«aiy«s were ,u.ckly frozen in liquid nitrogen and «ored at 

incasing to -20.000 r.p.m. o«r a period If'^^. We^^^^l^'it 
loVtumour homogenate as described in the Triiol prot«Kol ^^„V" 
".maJ step ,0 remove fat Once total RNA was obtuJ^S rflK 
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Fig. 5 Histologic features of breast earner biopsies can be recoQni:r»ri 

diagram in Fig. 4 shcnving gene dusters enriched for genes e.D«^«rf :« h J*"'" fl**** °" expression patterns. Enlaroements of the r«ion. oi ,k . 
cuhuredce.1 .ine. a. A Custer including -any genes cHae rstr^^^^^^^^^ "ncer specin^ens, a^isUuleJ^^^^^^^^^^^^ 

the oestrogen receptor and tumours, b. Genes exor e»#rf « ^J*"^'"*' expressed in cell lines (T47D and MCF7) derived frnm h,.«. ' 
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We combined mRNA from the following cdU in equal quanUties to 
make the reference pool: HL-60 (acute myeloid leukaemia) and K562 
(chronic mreloid leukaemia): UQ-Hm (non-small-celMung); COLO 

1^->MV1 (melanoma): 
OVCARO and (o»rian):CAKI.l (renaJ):PC.3 (prostate): and 

MCF7 and Hj57«T (breast). TTie criterion for selection of the ceD lines in 
the reference are described in detail in the accompanying manuscript 

Doubling-time cdculationa. We calculated doubling times based on rou- 
tine Na60 ceD hne compound screening dau; and they renect the dou- 
bhng times for cells inoculated into 96-well plates at the screening inocula- 
tion densiues and grown in RPMl 1640 medium supplemented with S% 
fetal bovine serum for 48 h. Wi measured ceU populaUons using sulforho- 
damine B opucaJ denshy measurement assay. The doubUng time constant k 
was calculated using the equation: N/No = wheie No is optical density 
for control (untreated) cells at time lero. N is optica] density for control cells 
after 48-h mcubation, «ndtis48h.n)esameequationwasthenusedwiththe 
derived k to calculate the doubling time t by setting N/No = 2. For a given ceU 
line, we obtained No and N vahies by averaging optical densities (N>6 000) 
obtained for each ceO line for a year's screening. Data and experimental dnails 
are available (hnp://dtpjici.nih.gov). 



«t hnp-J/rana stanford.edu/.ofiware). Each spot w» defined by mutual 
postfotung of a grid of circles ovn the array image. FoHS 
J^gMhe average pi«linte«i,y within .a7cird:w«de.enn^^ 
0C.1 background was computed for each spot equd to the mediu, pi»| 
■niensMy .square of 40 pixeU in width and height centred on the^^ 

3k k""* P";'^'*'" '"f «»*fin«i «Poa Net signal wu d^- 
mmed by subtract.on of this local background from the a^ge intemity 
for each spot. Spots deemed unsuitable for accurate quanUtation because 
of array anefacts were manually flagged and excluded from further anajy. 

Data files generated by ScanAlyie were entered into a custom databaL 
ihatmamismsweb-accessible files. Signal intensities between the two fluo- 
rescent unages were normalized by applying a uniform scale (actor to all 
m.»s...es measured for the Cy5 channel. Tlte normali^tion fanor ^ 
chosen so that the mean log(Cy3/Cy5) for a subset o/spou that achieved a 
mmimumqualny parameter (approa^^ 

s?X;^rfS^^^^^^ 



Preparation and hybridization of fluorescent iabclied cDNA. For each 
comparative array hybridization, labeDed cDNA was synthesized by reverse 
transcription from test ceU mRNA in the presence of Cy5-dUTP, and from 
the reference mRNA with Cy3-dUTP, using the Superscript I] reverse-tran- 
scnption kil (Gibco.BRL). For each reverse transcription reaction. mRNA 
(2 Mg) was mixed with an anchored oUgo-dT (d.20T-d(AGC)) primer (4 
Mg) in a.totaJ voJume of 15 ill heated to 70 for 10 min and cooled on ice 
To this sample, we added an unlabelled nucleotide pool (0 6 ul' 25 mM 
«ch dATP dCTP. dGTP. and 15 mM dTTP). either Cy3 or Cy5 conjugated 
dUTP (3 Hi; 1 mM; Amersham), 5xfirsi-sirand buffer (6 ul: 250 mM Trie 
HCX. pH 8.3. 375 mM KQ. 15 mM MgQ;). 0.1 M DTT (3 pi) and 2 ul of 
Superscript II reverse transcriptase (200 M/Ul). After a 2.h incubation at 42 
C the RNA was degraded by adding I N NaOH ( 1 .5 mD and incubating at 
70 C for 10 mm. The mixture was neutralized by adding of 1 N HCL ( 1 5 
Ml), and the volume brought to 500 jil with TE ( 10 mM Tris, 1 mM EDTA) 
We added CotI human DNA (20 Hg; Gibco-BRL). and purified the probe 
by centrifugation in a Ccntricon.30 micro-concentrator (Amicon) The 
two separate probes were combined, brought to a volume of 500 Ml and 
concentrated again to a volume of less than 7 m1. We added 10 ue/ul 
poly(A) RNA (1 Ml; Sigma) and tRNA (10 Mg/Ml; Gibco-BRL) were added 
and adjusted the volume to 9.5 Ml with distilled water. For Anal prob<! 
preparation. 20xSSC (2.1 Ml; 1.5 M NaQ. 150 mM NaQtrate. pH 8 0) and 
10% SDS (0.35 Ml) were added to a total final volume of 12 mI. The probes 
were denatured by heating for 2 min at 100 "C incubated at 37 •C for 
20-30 mm. and placed on the array under a 22 mmx22 mm glass covrrslip 
We incubated slides overnight at 65 "C for 14-18 h in a custom slide cham- 
ber with humidity maintained by a small reservoir of 3xSSC Arrays were 
washed by submersion and agitation for 2-5 min in 2xSSC with 0 1% SDS 
foUowed by IxSSC and then O.lxSSC The arrays were "spun dry*-* by cen- 
infugation for 2 min in a slide-rack in a Beckman GS-b tableiop centrifuge 
in Microplus carriers at 650 r.p.m. for 2 min. 



Cluster analyse We extracted tables (rows of genes, columns of individual 
microarray hybnduat.ons) of normalized nuorescence ratios from the dau- 
base. Vanous selection criteria, discussed in relation to each dau set were 
appbed to select subsets of genes from the 9.703 cDNA elements on the 
arrays. Before dustermg and display, the logarithm of the measuml fluotes- 

cencerauosforeachgenewerecentred by subtraaing the arithmetic mean of 
aUranos measured fonha. gene. The cemring makes all subs«,uent.nal,J 
mdjendem of the amount of each gene's mRNA in the reference pod 

Wt applied a hierarchical clustering algorithm separately to the ceU lines 
and genes using the Pearson correlation coefficient as the measure of simi- 
anty and average linkage clustering'.'»-2i. Th. results of this process « 
tvvo dendrograms (trees), one for the ceD lines and one for the genes, in 
which very simjlar elements are connected by shor, branches, and long^ 

dU„"lf 1""" T <"™""'«'''« degrees of rimifarity. For visLl 
display the rows and columns in the initial data tabk were reordered to 
conform to the structures of the dendrogram, obuined from the cluster 
analysis Each ce^l in the duster-ordered data uble was replaced by a gradi^ 
colour (pure red through blade to pure green), representing the mn^ 
ad;us^d ratio value in the ceU. Gene labels in cZer diagrfms areZ 
played here only for genes that were represented in the microarray by 
sequence-verified cDNAs. A complete software implemenution of thb 
process is available (http://rana.stanford.edu/softw.re). as well as aD dus- 
termg results (hltp://genome- www.stanford.edu/nci60). 



Array quantiuUon and dau processing. Following hybridization, arrays 
were scanned using a laser-scanning microscope (ref. 17: http://cmgm 
sianford.edu/pbrown). Separate images were acquired for Cy3 and CyS We 
carried out data reduction with the program SeanAlyze (M.B.E., avaUable 
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differentially expressed genes in healthy and diseased subjects 

Cross Reference to Related Applications: 
5 This application is a continuation-in-part application of U.S. Serial No. 

08/195,485 filed February 14, 1994, the contents of which are incorporated herein by 
reference. 

Field of thg InventiiQn 

10 The present invention relates to the use of immobilized 

oligonucleotide/polynucleotide or polynucleotide sequences for the identification, 
sequencing and characterization of genes which are implicated in disease, infection, 
or development and the use of such identified genes and the proteins encoded thereby 
in diagnosis, prognosis, therapy and drug discovery. 

15 

Background of the Invention 

Identification, sequencing and characterization of genes, especially 
human genes, is a major goal of modem scientific research. By identifying genes, 
determining their sequences and characterizing their biological function, it is possible 

20 to employ recobinant DNA technology to produce large quantities of valuable "gene 
products", e.g., proteins and peptides. Additionally, knowledge of gene sequences 
can provide a key to diagnosis, prognosis and treatment of a variety of disease states 
in plants and animals which are characterized by inappropriate expression and/or 
repression of selected gene(s) or by the influence of external factors, e.g., carcinogens 

25 or teratogens, on gene function. The term disease-associated genes(s) is used herein 
in its broadest sence to mean not only genes associated with classical inherited 
diseases, but also those associated with genetic predisposition to disease as well as 
infectious or pathogenic states resulting from gene expression by infectious agents or 
the effect on host cell gene expression by the presence of such a pathogen or its 

30 products Locating disease-associated genes will permit the development of 
diagnostic and prognostic reagents and methods, as well as possible therapeutic 
regimens, and the discovery of new drugs for treating or preventing the occurrence of 
such diseases. 

Methods have been described for the identification of certain novel 
35 gene sequences, referred to as Expressed Sequence Tags (EST) [see, e.g., Adams et 
al. Science . 252:1651-1656 (1991); and International Patent Application No. 
WO93/00353, published January 7, 1993]. Conventially, an EST is a specific cDNA 
polynucleotide sequence, or tag, about 150 to 400 nucleotides in length, derived from 
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a messenger RNA molecule by reverse transcription, which is a marker for, and 
component of, a human gene actually transcribed in vivo. However, as used herein an 
EST also refers to a genomic DNA fragment derived from an organism, such as a 
microorganism.the DNA of which lacks intron regions. 
5 A variety of techniques have been described for identifying particular 

gene sequences on the basis of their gene products. For example, several techniques 
are described in the art [see, e.g.. International Patent Application No. W09 1/07087, 
published May 30, 1991]. Additionally, known methods exist for the amplification of 
desired sequences [see, e.g., International Patent Application No. W091/17271, 

10 published November 14, 1991, among others]. 

However, at present, there exist no established methods for filling the 
need in the art for methods and reagents which employ fragments of differentially 
expressed genes of known, unknown (or previously unrecognized ) function or 
consequence to provide diagnostic and therapeutic methods and reagents for diagnosis 

15 and treatment of disease or infection, which conditions are characterized by such 
genes and gene products. It should be appreciated that it is the expression differences 
that are diagnostic of the altered state (e.g., predisease, disease, pathogenic, 
progression or infectious). Such genes associated with the altered state are likely to 
be the targets of drug discovery, whether the genes are the cause or the effect of the 

20 condition, identification of such genes provides insight into which gene expression 
needs to be re-altered in order to reestablished the healthy state. 

Summary of the Invention 

In one aspect, the invention provides methods for identifying gene(s) 

25 which are differentially expressed, for example, in a normal healthy organism and an 
organism having a disease. The method involves producing and comparing 
hybridization patterns formed between samples of expressed mRNA or cDNA 
polynucleotide sequences obtained from either analogous cells, tissues or organs of a 
healthy organism and a diseased organism and a defined set of 

30 oligonucleotide/polynucleotide/polynucleotide sequence probes from either an 
healthy organism or a diseased organism immobilized on a support. Those defined 
oligonucleotide/polynucleotide sequences are representative of the total expressed 
genetic component of the cells, tissues, organs or organism as defined the collection 
of partial cDNA sequences (ESTs). The differences between the hybridization 

35 patterns permit identification of those particular EST or gene-specific 
oligonucleotide/polynucleotide sequences associated with differential expression, and 
the identification of the EST permits identification of the clone from which it was 



2 
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derived and using ordinary skill further cloning and, if desired, sequencing of the full- 
length cDNA and genomic counterpart, i.e., gene, from which it was obtained. 

In another aspect, the invention provides methods substantially similar 
to those described above, but which permit identification of those gene(s) of a 
5 pathogen which are expressed in any biological sample of an infected organism based 
on comparative hybridization of RNA/cDNA samples derived from a healthy versus 
infected organism, hybridized to an oligonucleotide/polynucleotide set representative 
of the gene coding complement of the pathogen of inteTCSt. 

In another aspect, the invention provides methods substantially similar 

10 to those described above, but which permit identification of those ESTs-specific 
oligonucleotide/polynucleotide sequences of host gene(s) which represent genes being 
differentially expressed/ altered in expression by the disease state, or infection and are 
expressed in any biological sample of an infected organism based on comparative 
hybridization of RNA/cDNA samples derived from a healthy versus infected 

15 organism of interest. 

In a further aspect, the methods described above and in detail below, 
also provide methods for diagnosis of diseases or infections characterized by 
differentially expressed genes, the expression of which has been altered as a result of 
infection by the pathogen or disease causing agent in question. All identified 

20 differences provide the basis for diagnostic testing be it the altered expression of 
endogenous genes or the patterned expression of the genes of the infecting organism. 
Such patterns of altered expression are defined by comparing RNA/cDNA from the 
two states hybridized against a panel of oligonucleotide/polynucleotides representing 
the expressed gene component of a cell, tissue, organ or organism as defined by its 

25 collection of ESTs. 

Yet a further aspect of this invention provides a composition suitable 
for use in hybridization, which comprises a solid surface on which is immobilized at 
pre-defined regions thereon a plurality of defined oligonucleotide/polynucleotide 
sequences for hybridization, each sequence comprising a fragment of an EST isolated 

30 from a cDNA or DNA library prepared from at least one selected tissue or cell 
sample of a healthy (i.e., pre-disease state) animal, at least one analogous sample of 
an animal having a disease, at least one analogous sample of an animal infected with a 
pathogen or the pathogen itself, or any combination or multiple combinations thereof. 

An additional aspect of the invention provides an isolated gene 

35 sequence which is differentially expressed in a normal healthy animal and an animal 
having a disease, and is identified by the methods above. Similarly, an isolated 
pathogen gene sequence which is expressed in tissue or cell samples of an infected 
animal can be identified by the methods above. 

3 
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Yet another aspect of the invention is that it provides not only a means 
for a static diagnostic but also provides a means for a carrying out the procedure over 
time to measure disease progression as well as monitoring the efficacy of disease 
treatment regimes including an toxicological effects thereof. 
5 Another aspect of the invention is an isolated protein produced by 

expression of the gene sequences identified above. Such proteins are useful in 
therapeutic compositions or diagnostic compositions, or as targets for drug 
development. 

Other aspects and advantages of the present invention are described 
10 further in the following detailed description of the preferred embodiments thereof. 

Detailed Description of the Invention 

The present invention meets the unfulfilled needs in the art by 
providing methods for the identification and use of gene fragments and genes, even 

15 those of unknown full length sequence and unknown function, which are 
differentially expressed in a healthy animal and in an animal having a specific disease 
or infection by use of ESTs derived from DNA libraries of healthy and/or 
diseased/infected animals. Employing the methods of this invention permits the 
resulting identification and isolation of such genes by using their corresponding ESTs 

20 and thereby also permits the production of protein products encoded by such genes. 
The genes themselves and/or protein products, if desired, may be employed in the 
diagnosis or therapy of the disease or infection with which the genes are associated 
and in the development of new drugs therefor. 

It has been appreciated that one or more differentially identified EST 

25 or gene-specific oligonucleotide/polynucleotides define a pattern of differentially 
expressed genes diagnostic of a predisease, disease or infective state, A knowledge of 
the specific biological function of the EST is not required only that the ESTs 
identifies a gene or genes whose altered expression is associated reproducibly with 
the predisease, disease or infectious state. The differences permit the identification of 

30 gene products altered in their expression by the disease and represent those products 
most likely to be targets of therapeutic intervention. Similarly, the product may be of 
the infecting organism itself and also be an effective target of intervention. 

/. Definitions. 

35 Several words and phrases used throughout this specification are 

defined as follows: 

As used herein, the term "gene" refers to the genomic nucleotide 
sequence ft-om which a cDNA sequence is derived, which cDNA produces an EST, as 

4 
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described below. The term gene classically refers to the genomic sequence, which, 
upon processing, can produce different cDNAs, e.g., by splicing events. However, 
for ease of reading, any full-length counterpart cDNA sequence which gives rise to an 
EST will also be referred to by shorthand herein as a 'gene*. 
5 The term "organism" includes without limitation, microbes, plants and 

animals. 

The term "animal" is used in its broadest sense to include all members 
of the animal kingdom, including humans. It should be understood, however, that 
according to this invention the same species of animal which provides the biological 
10 sample also is the source of the defined immobilized oligonucleotide^lynucleotides 
as defined below. 

The term "pathogen" is defined herein as any molecule or organism 
which is capable of infecting an animal or plant and replicating its nucleic acid 
sequences in the cells or tissues of that animal or plant . Such a pathogen is generally 

15 associated with a disease condition in the infected animal or plant. Such pathogens 
may include viruses, which replicate intra- or extra-cellularly, or other organisms, 
such as bacteria, fungi or parasites; which generally infect tissues or the blood. 
Certain pathogens or microorganisms are known to exist in sequential and 
distinguishable stages of development, e.g., latent stages, infective stages, and stages 

20 which cause symptomatic diseases. In these different stages, the pathogens are 
anticipated to express differentially certain genes and/or turn on or off host cell gene 
expression. 

As used herein, tiie term "disease" or "disease state" refers to any 
condition which deviates from a normal or standardized healthy state in an organism 

25 of the same species in terms of differential expression of the organism's genes. In 
other words, a disease state can be any illness or disorder be it of genetic or 
environmental origin , for example, an inherited disorder such as certain breast 
cancers, or a disorder which is characterized by expression of gene(s) normally in an 
inactive, 'turned off state in a healthy animal, or a disorder which is characterized by 

30 under-expression or no expression of gene(s) which is normally activated or 'turned 
on* in a normal healthy animal. Such differential expression of genes may also be 
detected in a condition caused by infection, inflammation, or allergy, a condition 
caused by development or aging of the animal, a condition caused by administration 
of a drug or exposure of the animal to another agent, e.g., nutrition, which affects 

35 gene expression. Essentially, the methods described herein can be adapted to detect 
differential gene expression resulting from any cause, by manipulation of the defined 
oligonucleotide/polynucleotides and the samples tested as described below. The 
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concept of disease or disease state also includes its temporal aspects in terms of 
progression and treatment. 

The phrase "differentially expressed" refers to those situations in 
which a gene transcript is found in differing numbers of copies, or in activated vs 
5 inactivated states, in different cell types or tissue types of an organism, having a 
selected disease as contrasted to the levels of the gene transcript found in the same 
cells or tissues of a healthy organism. Genes may be differentially expressed in 
differing states of activation in microorganisms or pathogens in different stages of 
development For example, multiple copies of gene transcripts may be found in an 

10 organism having a selected disease, while only one, or significandy fewer copies, of 
the same gene transcript are found in a healthy organism, or vice-versa. 

As used herein, the term "solid support" refers to any known substrate 
which is useful for the immobilization of large numbers of 
oligonucleotide/polynucleotide sequences by any available method to enable 

15 detectable hybridization of the immobilized oligonucleotide/polynucleotide sequences 
with other polynucleotide sequences in a sample. Among a number of available solid 
supports, one desirable example is the supports described in International Patent 
Application No. WO91/07087, published May 30, 1991.Also useful are suports such 
as but not limited to nitrocellulose, mylein, glass, silica ans Pall Biodyne C® It is 

20 also anticipated that improvements yet to be made to conventional solid supports may 
also be employed in this invention. 

The term "surface" means any generally two-dimensional structure on 
a solid support to which the desired oligonucleotide/polynucleotide sequence is 
attached or immobilized. A surface may have steps, ridges, kinks, tmaces and the 

25 like. 

As used herein, the term "predefined region" refers to a localized area 
on a surface of a solid support on which is immobilized one or multiple copies of a 
particular oligonucleotide/polynucleotide sequence and which enables the 
identification of the oligonucleotide/polynucleotide at the position, if hybridization of 
30 that oligonucleotide/polynucleotide to a sample polynucleotide occurs. 

By "immobilized" refers to the attachment of the 
oligonucleotide/polynucleotide to the solid support Means of immobilization are 
known and conventional to those of skill in the art, and may depend on the type of 
support being used. 

35 By "EST" or "Expressed Sequence Tag" is meant a partial DNA or 

cDNA sequence of about 150 to 500. more preferably about 300, sequential 
nucleotides of a longer sequence obtained from a genomic or cDNA library prepared 
ft^om a selected cell, cell type, tissue or tissue type, organ or organism which longer 
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sequence corresponds to an mRNA of a gene found in that library. An EST is 
generally DNA. One or more libraries made from a single tissue type typically 
provide at least about 3000 different (i.e., unique) ESTs and potentially the full 
complement of all possible ESTs representing all cDNAs e.g., 50,000-100,000 in an 
5 animal such as a human. Further background and information on the construction of 
ESTs is described in M. D. Adams et al. Science . 252:1651-1656 (1991); and 
International Application Number PCnyUS92/05222 (January 7, 1993). 

As used herein, the term "defined oligonucleotide/polynucleotide 
sequence" refers to a known nucleotide sequence fragment of a selected EST or gene, 
10 This term is used interchangeably with the term "fragments of EST". These 
sequential sequences are generally comprised of between about 15 to about 45 
nucleotides and more preferably between about 20 to about 25 nucleotides in length. 
Thus any single EST of 300 nucleotides in lengtii may provide about 280 different 
defined oligonucleotide/polynucleotide sequences of 20 nucleotides in length (e.g., 
15 20-mers). The lengths of die defined oligonucleotide/polynucleotides may be readily 
increased or decreased as desired or needed, depending on the limitations of the solid 
support on which they may be immobilized or the requirements of die hybridization 
conditions to be employed.The length is generally guided by the principle that it 
should be of sufficient length to insure that it is one average only represented once in 
20 the population to be examined. Generally, these defined 

oligonucleotide/polynucleotides are RNA or DNA and are preferably derived from 
the anti-sense strand of the EST sequence or from a corresponding mRNA sequence 
to enable their hybridization witii samples of RNA or DNA. Modified nucleotides 
may be incorporated to increase stability and hybridization properties. 
25 By the term "plurality of defined oligonucleotide/polynucleotide 

sequences" is meant the following. A surface of a solid support may immobilize a 
large number of "defined oligonucleotide/polynucleotides". For example, depending 
upon the nature of the surface, it can immobilize from about 300 to upwards of 
60,000 defined 20-mer oligonucleotide/polynucleotides. It is anticipated that future 
30 improvements to solid surfaces will permit considerably larger such pluralities to be 
immobilized on a single surface. A "plurality" of sequences refers to the use on any 
one solid support of multiple different defined oligonucleotide/polynucleotides from a 
single EST fix)m a selected library, as well as multiple different defined 
oligonucleotide/polynucleotides from different ESTs from the same library or many 
35 libraries from the same or different tissues, and may also include multiple identical 
copies of defined oligonucleotide/polynucleotides. Ultimately a pluarality has at least 
one oligonucleotide/polynucleotide per expressed gene in die entire organism For 
example, from a library producing about 5,000-10,000 ESTs, a single support can 
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include at least about 1-20 defined oligonucleotide/polynucleotides representing every 
EST in that library. The composidon of defined oligonucleotide^olynucleotides 
which make up a surface according to this invention may be selected or designed as 
desired. 

5 The term "sample" is employed in the description of this invention in 

several important ways. As used herein, the term "sample" encompasses any cell or 
tissue from an organism. Any desired cell or tissue type in any desired state may be 
selected to form a sample. For example, the sample cell desired may be a human T 
cell; the desired cell type for use in this invention may be a quiescent T cell or an 

10 activated T cell. 

By the phrase "analogous sample" or "analogous cell or tissue" is 
meant that according to this invention when the ESTs which provide the defined 
oligonucleotide/polynucleotides are produced from a cDNA library prepared from a 
single tissue or cell type source sample, e.g., liver tissue of a human, then the samples 

15 used to hybridize to those immobilized defined oligonucleotide/polynucleotides are 
preferably providied by the same type of sample from either a healthy or diseased 
animal, i.e., liver tissue of a healthy human and liver tissue of a diseased or infected 
human or from a human suspected of having that disease or infection. Alternatively, 
if the surface contains defined oligonucleotide/polynucleotides from multiple cells or 

20 tissues, then the "samples" which are hybridized thereto can be but are not limited to 
samples obtained from analogous multiple tissues or cells. 

By the term "detectably hybridizing" means that the sample from the 
healthy organism or diseased or infected organism is contacted with the defined 
oligonucleotide/polynucleotides on the surface for sufficient time to permit the 

25 formation of patterns of hybridization on the surfaces caused by hybridization 
between certain polynucleotide sequences in the samples with the certain immobilized 
defined oligonucleotide/polynucleotides. These patterns are made detectable by the 
use of available conventional techniques, such as fluorescent labelling of the samples. 
Preferably hybridization takes place under stringent conditions, e.g., revealing 

30 homologies of about 95%. However, if desired, other less stringent conditions may 
be selected. Techniques and conditions for hybridization at selected stringencies are 
well known in the art [see, e.g., Sambrook et al, Molecular Cloning. A Laboratorv 
Manual. . Cold Spring Harbor Laboratory, Cold Spring Harbor, NY (1989)]. 

35 //. Compositions of The Invention 

The present invention is based upon the use of ESTs from any desired 
cell or tissue in known technologies for oligonucleotide/polynucleotide hybridization. 
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A, ESTs 

An EST, as defined above, is for an animal, a sequence from a 
cDNA clone that corresponds to an mRNA. The EST sequences useful in the present 
invention are isolated preiferably from cDNA libraries using a rapid screening and 
5 sequencing technique. Custom made cDNA libraries are made using known 
techniques. See, generally, Sambrook et al, cited above. Briefly, mRNA from a 
selected cell or tissue is reverse transcribed into complementary DNA (cDNA) using 
the reverse transcriptase enzyme and made double-stranded using RNase H coupled 
with DNA polymerase or reverse transcriptase. Restriction enzyme sites are added to 

10 the cDNA and it is cloned into a vector. The result is a cDNA library. Alternatively, 
commercially available cDNA libraries may be used. Libraries of cDNA can also be 
generated from recombinant expression of genomic DNA using known techniques, 
including polymerase chain reaction-derived techniques, 

ESTs (which can range from about 150 to about 500 nucleotides in 

15 length, preferably about 300 nucleotides) can be obtained through sequence analysis 
from either end of the cDNA insert. Desirably, the DNA libraries used to obtain 
ESTs use directional cloning methods so that either the 5' end of the cDNA Qikely to 
contain coding sequence) or the 3' end (likely to be a non-coding sequence) can be 
selectively obtained, 

20 In general, the method for obtaining ESTs comprises applying 

conventional automated DNA sequencing technology to screen clones, 
advantageously randomly selected clones, from a cDNA library. The cDNA libraries 
from the desired tissue can be prcprocessed, or edited, by conventional techniques to 
reduce repeated sequencing of high and intermediate abundance clones and to 

25 maximize the chances of finding rare messages from specific ceU populations. 
Preferably, preprocessing includes the use of defined composition prescreening 
probes, e.g., cDNA corresponding to mitochondria, abundant sequences, ribosomes, 
actins, myelin basic polypeptides, or any other known high abundance peptide. These 
prescreening probes used for preprocessing are generally derived from known ESTs. 

30 Other useful preprocessing techniques include subtraction hybridization, which 
preferentially reduces the population of highly represented sequences in the library 
[e.g., see Fargnoli et al, Anal. Biochem. . 187:364 (1990)] and normalization, which 
results in all sequences being represented in approximately equal proportions in the 
library [Patanjali et al, Proc. Nari. Ac ad. Sci. USA. Sa:1943 (1991)]. Additional 

35 prescreening/differential screening approaches are known to those skilled in the art. 

ESTs can then be generated from partial DNA sequencing of the 
selected clones. The ESTs useful in the present invention are preferably generated 
using low redundancy of sequencing, typically a single sequencing reaction. While 
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single sequencing reactions may have an accuracy as low as 90%, this nevertheless 
provides sufficient fidelity for identification of the sequence and design of PGR 
primers. 

If desired, the location of an EST in a full length cDNA is determined 
5 by analyzing the EST for the presence of coding sequence. A conventional computer 
program is used to predict the extent and orientation of the coding region of a 
sequence (using all six reading frames). Based on this information, it is possible to 
infer the presence of start or stop codons within a sequence and whether the sequence 
is completely coding or completely non-coding or a combination of the two. If start 

10 or stop codons are present, then the EST can cover both part of the 5'-untranslated or 
3 -untranslated part of the mRNA (respectively) as well as part of the coding 
sequence. If no coding sequence is present, it is likely that the EST is derived from 
the 3' untranslated sequence due to its longer length and the fact that most cDNA 
library construction methods are biased toward the 3' end of the mRNA. It should be 

15 understood that both coding and non-coding regions may provide ESTs equally useful 
in the described invention. 

A number of specific ESTs suitable for use in die present 
invention are described above Adams et al (supm), which may be incorporated by 
reference herein, to describe non-essential examples of desirable ESTs. Other ESTs 

20 exist in the art which may also be useful in this invention, as will ESTs yet to be 
developed by these known techniques. 

5. Preparing the Solid Support of the Invention 

Oligonucleotide sequences which are fragments of defined 
sequence are derived from each EST by conventional means, e-g., conventional 

25 chemical synthesis or recombinant techniques. Each defined 

oligonucleotide/polynucleotide sequence as described above is a fragment, can be, but 
is not necessarily an anti-sense fragment, of an EST isolated from a DNA library 
prepared from a selected cell or tissue type from a selected animal. For use in the 
present invention, it is presentiy preferred that the defined 

30 oligonucleotide/polynucleotide sequences are 20-25mers. As described above, for 
each EST a number of such 20-25mers may be generated. The lengths may vary as 
described above as well as the composition. For example 
oligonucleotide/polynucleotides can be modified based on the Oligo 4.0 or simiolar 
programs to predict hybridization potential or to include modifieid nucleotides for the 

35 reasons given above. It is alos appreciated that large DNA segments may be 
employed including entire ESTs or even full length genes particular when inserted 
into cloning vectors. 
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A plurality of these defined oligonucleotide/polynucleotide 
sequences are then attached to a selected solid support conventionally used for the 
attachment of nucleotide sequences again by known means. In contrast to other 
technologies available in the art, this support is designed to contain defined, not 
5 random, oligonucleotide/polynucleotide sequences. The EST fragments, or defined 
oligonucleotide/polynucleotide sequences, immobilized on the solid support can 
include fragments of one or more ESTs from a library of at least one selected tissue 
or cell sample of a healthy animal, at least one analogous sample of the animal having 
a disease, at least one analogous sample of the animal infected with a pathogen, and 

10 any combination thereof. 

Numerous conventional methods are employed for attaching 
biological molecules such as oligonucleotide/polynucleotide sequences to surfaces of 
a variety of soUd supports. See, e,g., Affipity TeghmqugS, En^ypig Pymfigatipp; Pan 
B, MgtflPdS in gngymplQgy. Vol, 34, ed. W.B. Jakoby, M. Wilcheck, Acad. Press, 

15 NY (1974); ImmobUi^gd Bioghgmigals and Affinity ChrQPdatQgraphYi Advanggs in 
Expcrimentali Medigine and BiolPgy, vol. 42, ed. R. Dunlap, Plenum Press, NY 
(1974); U. S. Patent No. 4,762,881; U. S. Patent No. 4,542,102; European Patent 
PubUcation No, 391,608 (October 10, 1990); U. S. Patent No. 4.992,127 (Nov, 21. 
1989). 

20 One desirable method for attaching 

oligonucleotide/polynucleotide sequences derived from ESTs to a solid support is 
described in International Application No. PCTAJS90/06607 (published May 30, 
1991). Briefly, this method involves forming predefined regions on a surface of a 
solidsupport, where the predefined regions are capable of immobilizing ESTs, The 

25 methods make use of binding substances attached to the surface which enable 
selective activation of the predefined regions. Upon activation, these binding 
substances become capable of binding and immobilizing 
oligonucleotide/polynucleotides based on EST or longer gene sequences. 

Any of the known solid substrates suitable for binding 

30 oligonucleotide/polynucleotides at pre-defined regions on the surface thereof for 
hybridization and methods for attaching the oligonucleotide/polynucleotides thereto 
may be employed by one of skill in the art according to this invention. Similarly, 
known conventional methods for making hybridization of the immobilized 
oligonucleotide/polynucleotides detectable, e.g., fluorescence, radioactivity, 

35 photoactivation, biotinylation, solid state circuitry, and the like may be used in this 
invention. 

Thus, by resorting to known techniques, the invention provides 
a composition suitable for use in hybridization which consists of a surface of a solid 

11 
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support on which is immobilized at pre-defined regions on said surface a plurality of 
defined oligonucleotide/polynucleotide sequences for hybridization. For example, 
one composition of this invention is a solid support on which are immobilized oligos 
of EST fragments from a library constructed from a single cell type, e.g., a human 
5 stem cell, or a single tissue, e.g., human liver, from a healthy human. Still another 
composition of this invention is another solid support on which are inmiobilized 
oligos of EST firagments from a library constructed from a single cell type or a tissue 
from a human having a selected disease or predispositon to a selected disease, eg., 
liver cancer. 

10 Another embodiment of the compositions of this invention 

include a single solid support having oligonucleotides of ESTs from both single cell 
or single tissue libraries from both a healthy and diseased human. Still other 
embodiments include a single support on which are inimobilized oligos of EST 
fragments from more than one tissue or cell library from a healthy human or a single 

15 support on which are immobilized more than one tissue or cell library from both 
healthy and diseased animals or humans. A preferred composition of this invention is 
anticipated to be a single support containing oligos of ESTs for all known cells and 
tissues from a selected organism, 

20 ///. The Methods of the Invention 

A . Identification of Genes 

The present invention employs the compositions described 
above in methods for identifying genes which are differentially expressed in a normal 
healthy organism and an organism having a disease or infection. These methods may 

25 be employed to detect such genes, regardless of the state of knowledge about the 
function of the gene. The method of this invention by use of the compositions 
containing multiple defined EST fragments from a single gene as described above is 
able to detect levels of expression of genes or in other cases simply the expression or 
lack thereof, which differ between normal, healthy organisms and organisms having a 

30 selected disease, disorder or infection. 

One such method employs a first surface of a solid support on 
which is immobilized at pre-defined regions thereon a plurality of defined 
oligonucleotide/polynucleotide sequences, described above, of ESTor longer gene 
fragment isolated from a cDNA library prepared from at least one selected tissue or 

35 cell sample of a healthy animal (the "healthy test surface") and a second such surface 
on which is iimnobilized at pre-defined regions a plurality of defined 
oligonucleotide/polynucleotide sequences of ESTor longer gene fragment isolated 
ftx>m at least one analogous tissue of an animal having a selected disease (the "disease 

12 
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test surface"). These test surfaces may be standardized for the selected animal or 
selected cell or tissue sample from that animal (i.e., they are prescreened for 
polymorphisms in the species population). 

Polynucleotide sequences are then isolated from mRNA and/or 
5 cDNA from a biological sample from a known healthy animal ("healthy control") and 
a second sample is similarly prepared from a sample from a known diseased animal 
("disease sample"). These two samples are desfa-ably selected from the cell or tissue 
analogous to that which provided the immobilized oligonucleotide/polynucleotides. 

, According to the method the healthy control sample is 

10 contacted with one set of the healthy test surface and the disease test surface 
described above for a time sufficient to permit detectable hybridization to occur 
between the sample and the immobilized defined oligonucleotide/polynucleotides on 
each surface. The results of this hybridization are a first hybridization pattern formed 
between the nucleotides of healthy control and die healthy test surface and a second 

15 hybridization pattern formed between the nucleotides of healthy control sample and 
the disease test surface. 

In a similar manner, the disease sample is detectably hybridized 
to another set of healthy test and disease test surfaces, forming a third hybridization 
pattern between the disease sample and healthy test surface and a fourth hybridization 

20 pattern between the disease sample and the disease test surface. 

Comparing the four hybridization patterns permits detection of 
those defined oligonucleotide/polynucleotides which are differentially expressed 
between the healthy control and the disease sample by the presence of differences in 
the hybridization patterns at pre-defined regions. The 

25 oligonucleotide/polynucleotides on each surface which correspond to the pattern 
differences may be readily identified with the corresponding ESTor longer gene 
fragment from which the oligonucleotide/polynucleotides are obtained. 

In another embodiment of the method of this invention, the 
same process is employed, with the exception that plurality of defined 

30 oligonucleotide/polynucleotide sequences forming the healthy test sample and the 
disease test sample surfaces are inmiobilized on a single solid support. For example, 
each fragment of an EST or longer gene fragment on the surface is isolated from at 
least two cDNA libraries prepared from a selected cell or tissue sample of a healthy 
animal and an analogous selected cell or tissue sample of an animal having a disease. 

35 According to this embodiment, the healthy control sample is 

detectably hybridized to a copy of this single solid surface, forming one hybridization 
pattern with oligonucleotide/polynucleotides associated with both the healthy and 
diseased animal. Similarly, the disease sample is detectably hybridized to a second 

13 
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copy of this single solid surface, foraiing one hybridization pattern with 
oligonucleotide/polynucleotides associated with both the healthy and diseased animal. 

Comparing the two hybridization patterns permits detection of 
those defined oligonucleotide/polynucleotides which are differentially expressed 
5 between the healthy control and the disease sample by the presence of differences in 
the hybridization patterns at pre-defined regions. The 
oligonucleotide/polynucleotides on each surface which correspond to the pattern 
differences may be readily identified with the corresponding ESTor longer gene 
firagment from which the oligonucleotide/polynucleotides are obtained. 

10 The identification of one or more ESTs as the source of the 

defined oligonucleotide/polynucleotide which produced a "difference" in 
hybridization patterns according to these methods permits ready identification of the 
gene from which those ESTs were derived. Because oligonuleotides are of sufficient 
length that they will hybridize under stringent conditions only with a RNA/cDNA for 

15 that gene to which they correspond, the oligo can be used to identify the EST and in 
turn the clone from which it was derived and by subsequent cloning, obtain the 
sequence of the full-length cDNA and its genomic counterparts, i.e., the gene, from 
which it was obtained. 

In other words, the ESTs identified by the method of this 

20 invention can be employed to determine the complete sequence of the mRNA, in the 
form of transcribed cDNA, by using the EST as a probe to identify a cDNA clone 
corresponding to a full-length transcript, followed by sequencing of that clone. The 
EST or the full length cDNA clone can also be used as a probe to identify a genomic 
clone or clones that contain the complete gene including regulatory and promoter 

25 regions, exons, and introns. 

It should be appreciated that one does not have to be restricted 
in using ESTs from a particular tissue from which probe RNA or cDNA is obtained, 
rather any or all ESTs (known or unknown) may be placed on the support. 
Hybridization will be used a form diagnostic patterns or to identifiy which particular 

30 EST is detected. For example, all known ESTs from an organism are used to produce 
a "master" solid support to which control sample and disease samples are alternately 
hybridized. One then detects a pattern of hybridization associated with the particular 
disaease state which then forms the basis of a diagnostic test or the isolation of 
disease specific ESTs from which the intact gene may be cloned and sequenced 

35 leading uiltimately to a defined therapuetic target. 

Methods for obtaining complete gene sequences from ESTs are 
well-known to those of skill in the art. See, generally, Sambrook et al, cited above. 
Briefly, one suitable method involves purifying the DNA from the clone that was 

14 
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sequenced to give the EST and labeling the isolated insert DNA. Suitable labeling 
systems are well known to those of skill in the art [see, eg. Basic Methods in 
Molecular Biology, L. G. Davis et al, ed., Elsevier Press, NY (1986)]. The labeled 
EST insert is then used as a probe to screen a lambda phage cDNA library or a 
5 plasmid cDNA library, identifying colonies containing clones related to the probe 
cDNA which can be purified by known methods. The ends of the newly purified 
clones are then sequenced to identify full length sequences and complete sequencing 
of full length clones is performed by enzymatic digestion or primer walking. A 
similar screening and clone selection approach can be applied to clones from a 
10 genomic DNA library. 

Additionally, an EST or gene identified by this method as 
associated with inherited disorders can be used to determine at what stage during 
embryonic development the selected gene from which it is derived is developed by 
screening embryonic DNA libraries from various stages of development, e.g. 2-cell, 
15 8-cell, etc,, for the selected gene. As has been mentioned above, the invention may 
be applied in addtional temporal modes for monitoring the progression of a disease 
state, the efficacy of a particular treatment modality or the aging process of an 
individual. 

Thus, the methods of this invention permit the identification, 
20 isolation and sequencing of a gene which is differentially expressed in a selected 
diseaseAnfection. As described in more detail below, the identified gene may then be 
employed to obtain any protein encoded fliereby, or may be employed as a target for 
diagnostic methods or therapeutic approaches to the treatment of the disease, 
including, e.g., drug development. 
25 The same methods as described above for the identification of 

genes, including genes of unknown function, which are differentially expressed in a 
disease state, may also be employed to identify other genes of interest. For example, 
another embodiment of this invention includes a method for identifying a gene of a 
pathogen which is expressed in a biological sample of an animal infected with that 
30 pathogen or the gene of the host which is altered in its expression as a result of the 
infection. 

One such method employs a healthy test surface as described 
above, employing defined oligonucleotide/polynucleotides from a sample of a 
healthy, uninfected animal. The second such surface has immobilized at pre-defined 
35 regions thereon a plurality of defined oligonucleotide/polynucleotide sequences of 
ESTs isolated from at least one analogous tissue or cell sample of an infected animal 
(the "infection test surface"). Polynucleotide sequences are isolated from a biological 
sample from a healthy animal ("healthy control") and a second sample is similarly 
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prepared from an animal infected with the selected pathogen (''infection sample"). 
These two samples are desirably selected from the cell or tissue analogous to that 
which provided the immobilized oligonucleotide/polynucleotides. It would also be 
possible to provide samples from the nucleic acid of the padiogen itself. 

5 According to the method the healthy control sample is 

contacted with one set of the healthy test surface and the infection test surface 
described above for a time sufficient to permit detectable hybridization to occur 
between the sample and the immobilized defined oligonucleotide/polynucleotides on 
each surface. The results of this hybridization are a first hybridization pattern formed 

10 between the nucleotides of healthy control and the healthy test surface and a second 
hybridization pattern formed between die nucleotides of healthy control sample and 
the infection test surface. 

In a similar manner, the infection sample is detectably 
hybridized to another set of healthy test and infection test surfaces, forming a third 

15 hybridization pattern between the infection sample and healthy test surface and a 
fourth hybridization pattern between the infection sample and the infection test 
surface. 

Comparing the four hybridization patterns permits detection of 
those defined oligonucleotide/polynucleotides which are differentially expressed 

20 between the healthy animal and die animal infected with die pathogen by die presence 
of differences in the hybridization patterns at pre-defined regions. As mentioned 
differential expression is not required and simple qualitative analysis is possible by 
reference to gene expression which is simply present or absent. 

A second embodiment of this method parallels the second 

25 embodiment of the method as applied to disease above, i.e., die same process is 
employed, with the exception that plurality of defined oligonucleotide/polynucleotide 
sequences forming the healdiy test sample surface and the infection test sample 
surface are immobilized on a single solid support. The resulting first hybridization 
pattern (healthy control sample widi healthy/infection test sample) and second 

30 hybridization pattern (infection sample with healdiy/infection test sample) permits 
detection of diose defined oligonucleotide/polynucleotides which are differentially 
expressed between the healtiiy control and the infection sample by the presence of 
differences in the hybridization patterns at pre-defined regions. The 
oligonucleotide/polynucleotides on each surface which correspond to the pattern 

35 differences may be readily identified with die corresponding ESTs from which the 
oligonucleotide/polynucleotides are obtained. 

As described above for the methods for identifying differential 
gene expression between diseased and healthy animals, the 

16 
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oligonucleotide/polynucleotides on each surface which correspond to the pattern 
differences may be readily identified with the corresponding ESTs from which the 
oligonucleotide/polynucleotide sequences are obtained and the genes expressed by the 
pathogen identified for similar purposes. Other embodiments of these methods may 
5 be developed with resort to the teaching herein, by altering the samples which provide 
the defined oligonucleotide/polynucleotides. For example, an EST, identified with a 
differentially expressed gene by the method of this invention is also useful in 
detecting genes expressed in the various stages of an pathogen's development, 
particularly the infective stage and following the cours of drug treatment and 
10 emergence of resistant variants. For example, employing the techniques described 
above, the EST can be used for detecting a gene in various stages of the parasitic 
Plasmodium species life cycle, which include blood stages, liver stages, and 
gametocyte stages. 

B. Diagnostic Methods 
15 In addition to use of the methods and compositions of this 

invention for identifying differentially expressed genes, another embodiment of this 
invention provides diagnostic methods for diagnosing a selected disease state, or a 
selected state resulting from aging, exposure to drugs or infection in an animal. 
According to this aspect of the invention, a first surface, described as the healthy test 
20 surface above, and a second surface, described as the disease test surface or infection 
test surface, are prepared depending on the disease or infection to be diagnosed. The 
same processes of detectable hybridization to a first and second set of these surfaces 
with the healthy control sample and disease/infection sample are followed to provide 
the four above-described hybridization patterns, i.e., healthy control sample with 
25 healthy test surface; healthy control sample with diseaseAnfection test surface; 
disease/infection sample with healthy test surface; and disease/infection sample with 
disease/infection test surface. 

The diagnosis of disease or infection is provided by comparing 
the four hybridization patterns. Substantial differences between the first and tiiird 
30 hybridization patterns, respectively, and the second and fourth hybridization patterns, 
respectively, indicate the presence of the selected disease or infection in said animal. 
Substantial similarities in tiie first and tiiird hybridization patterns and second and 
fourth hybridization patterns indicates the absence of disease or infection. 

A similar embodiment utilizes the single surface bearing both 
35 the healthy test surface defined oligonucleotide/polynucleotides and tiie 
disease/infection test surface defined oligonucleotide/polynucleotides as described 
above. Parallel process steps as described above for detection of genes differentially 
expressed in disease and infected states are followed, resulting in a first hybridization 



wo 95/21944 



PCTAJS95/01863 



pattern (healthy control sample with single healthy and disease/infection test sample) 
and a second hybridization pattern (disease/infection sample with another copy of the 
single healthy and disease/infection test sample). 

Diagnosis is accomplished by comparing the two hybridization 
5 patterns, wherein substantial differences between the first and second hybridization 
patterns indicate the presence of the selected disease or infection in the animal being 
tested. Substantially similar first and second hybridization patterns indicate the 
absence of disease or infection. This like many of the foregoing embodiments may 
use known or unknown ESTs derived from many libraries. 

10 C. Other Methods of the Invention 

As is obvious to one of skill in the art upon reading this 
disclosure, the compositions and methods of this invention may also be used for other 
similar puiposes. For example, the general methods and compositions may be 
adapted easily by manipulation of the samples selected to provide the standardized 

15 defined oligonucleotide/polynucleotides, and selection of the samples selected for 
hybridization thereto. One such modification is the use of this invention to identify 
cell markers of any type, e.g., markers of cancer cells, stem cell markers, and the like. 
Another modification involves the use of the method and compositions to generate 
hybridization patterns useful for forensic identification or an 'expression fingerprint' 

20 of genes for identification of one member of a species from another. Similarly, the 
methods of this invention may be adapted for use in tissue matching for 
transplantation purposes as well as for molecular histology, i.e., to enable diagnosis of 
disease or disorders in pathology tissue samples such as biopsies. Still another use of 
this method is in monitoring the effects of development and aging upon the gene 

25 expression in a selected animal, by preparing surfaces bearing 
oligonucleotide/polynucleotides prepared fix)m samples of standardized younger 
members of the species being tested. Additionally the patient can serve as an internal 
control by virtue of having the method applied to blood samples every 5-10 years 
during his lifetime. 

30 Still another intriguing use of this method is in the area of 

monitoring the effects of drugs on gene expression, both in laboratories and during 
clinical trials with animal, especially humans. Because the method can be readily 
adapted by altering the above parameters, it can essentially be employed to identify 
differentially expressed genes of any organism, at any stage of development, and 

35 under the influence of any factor which can affect gene expression. 
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IV, The Genes and Proteins Identified 

Application of the compositions and methods of this invention as 
above described also provide other compositions, such as any isolated gene sequence 
which is differentially expressed between a normal healthy animal and an animal 
5 having a disease or infection. Another embodiment of this invention is any isolated 
pathogen gene sequence which is expressed in tissue or cell samples of an infected 
animal. Similarly an embodiment of this invention is any gene sequence identified by 
the methods described herein. 

These gene sequences may be employed in conventional methods to 

10 produce isolated proteins encoded thereby. To produce a protein of this invention, 
the DNA sequences of a desired gene identified by the use of the methods of this 
invention or portions thereof are inserted into a suitable expression system. 
Desirably, a recombinant molecule or vector is constructed in which the 
polynucleotide sequence encoding the protein is operably linked to a heterologous 

15 expression control sequence permitting expression of the human protein. Numerous 
types of appropriate expression vectors and host cell systems are known in the art for 
manunalian (including himian) expression, insect, e.g., baculovirus expression, yeast, 
fungal, and bacterial expression, by standard molecular biology techniques. 

The transfection of these vectors into appropriate host cells, whether 

20 mammalian, bacterial, fungal, or insect, or into appropriate viruses, can result in 
expression of the selected proteins. Suitable host cells or cell lines for transfection, 
and viruses, as well as methods for the construction and transfection of such host cells 
and viruses are well-known. Suitable methods for transfection, culture, amplification, 
screening, and product production and purification are also known in the art. 

25 The genes and proteins identified by this invention can be employed, if 

desired in diagnostic compositions useful for the diagnosis of a disease or infection 
using conventional diagnostic assays. For example, a diagnostic reagent can be 
developed which detectably targets a gene sequence or protein of this invention in a 
biological sample of an animal. Such a reagent may be a complementary nucleotide 

30 sequence, an antibody (monoclonal, recombinant or polyclonal), or a chemically 
derived agonist or antagonist. Alternatively, the proteins and polynucleotide 
sequences of this invention, firagments of same, or complementary sequences thereto, 
may themselves be useful as diagnostic reagents for diagnosing disease states with 
which the ESTs of the invention are associated. These reagents may optionally be 

35 labelled using diagnostic labels, such as radioactive labels, colorimetric enzyme label 
systems and the like conventionally used in diagnostic or therapeutic methods, e.g. 
Northern and Western blotting, antigen-antibody binding and the like. The selection 
of the appropriate assay format and label system is within the skill of the art and may 
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readily be chosen without requiring additional explanation by resort to the wealth of 
art in the diagnostic area. 

Additionally, genes and proteins identified according to this invention 
may be used therapeutically. For example, the EST-containing gene sequences may 

5 be useful in gene therapy, to provide a gene sequence which in a disease is not 
properly or sufficiently expressed. In such a method, a selected gene sequence of this 
invention is introduced into a suitable vector or other delivery system for delivery to a 
cell containing a defect in the selected gene. Suitable delivery systems are well 
known to those of skill in the art and enable the desired EST or gene to be 

10 incorporated into the target cell and to be translated by the cell. The EST or gene 
sequence may be introduced to mutate the existing gene by recombination or provide 
an active copy thereof in addition to the inactive gene to replace its function. 

Alternatively, a protein encoded by an EST or gene of the invention 
may be useful as a therapeutic reagent for delivery of a biologically active protein, 

15 particularly when the disease state is associated with a deficiency of this protein. 
Such a protein may be incorporated into an appropriate therapeutic formulation, alone 
or in combination with other active ingredients. Methods of formulating such 
therapeutic compositions, as well as suitable pharmaceutical carriers, and the like, are 
well known to those of skill in the art. Still an additional method of delivering the 

20 missing protein encoded by an EST, or die gene from which a selected EST was 
derived, involves expressing it direcUy in vivo. Systems for such in vivo expression 
are well known in the art 

Yet another use of the ESTs, genes identified according to the methods 
of this invention, or the proteins encoded thereby is a target for the screening and 

25 development of natural or synthetic cheniical compounds which have utility as 
therapeutic drugs for the treatment of disease states associated with the identified 
genes and ESTs derived therefrom. As one example, a compound capable of binding 
to such a protein encoded by such a gene and either preventing or enhancing its 
biological activity may be a useful drug component for the treatment or prevention of 

30 such disease states. 

Conventional assays and techniques may be used for the screening and 
development of such drugs. As one example, a method for identifying compounds 
which specifically bind to or inhibit or activate proteins encoded by these gene 
sequences can include simply the steps of contacting a selected protein or gene 

35 product, with a test compound to permit binding of the test compound to the protein; 
and determining the amount of test compound, if any, which is bound to the protein. 
Such a method may involve the incubation of the test compound and the protein 
immobilized on a solid support. Still other conventional methods of drug screening 
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can involve employing a suitable computer program to determine compounds having 
similar or complementary chemical structures to that of the gene product or portions 
thereof and screening those compounds either for competitive binding to the protein 
to detect enhanced or decreased activity in the presence of the selected compound. 
5 Thus, through use of such methods, the present invention is anticipated 

to provide compounds capable of interacting with these genes, ESTs, or encoded 
proteins, or fragments thereof, and either enhancing or decreasing the biological 
activity, as desired. Such compounds are believed to be encompassed by this 
invention. 

10 Numerous modifications and variations of the present invention are 

included in the above-identified specification and are expected to be obvious to one of 
skill in the art Such modifications and alterations to the compositions and processes 
of the present invention are believed to be encompassed in the scope of the claims 
appended hereto. 

15 
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WHAT IS CLAIMED IS: 

1. A method for identifying genes which are differentially expressed in 
two different pre-determined states of an organism comprising: 
5 a. providing a first surface on which is immobilized at pre-defined 

regions on said surface a plurality of defined oligonucleotide/polynucleotide 
. sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene, isolated from a DNA library 
prepared from at least one selected cell, tissue, organ or organism sample in a first 
10 state and present in excess relative to the polynucleotide to be hybridized; 

b. providing a second surface on which is immobilized at pre-defined 
regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene, isolated from a DNA library 

15 prepared from at least one selected cell, tissue, organ or organism sample in a second 
state and present in excess relative to the polynucleotide to be hybridized; 

c. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated frt)m a sample from a said organism in said first 
state, said sample selected from sources analogous to the sources of step (a), said 

20 hybridization sufficient to form a first and second hybridization pattern on each said 
first and second surface, 

d. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a sample fix)m said organism in said second 
state, said sample selected from sbtirces analogous to the sources of step (c), said 

25 hybridization sufficient to form a third and fourth hybridization pattern on each said 
first and second surface, ^ , 

e. comparing at least two of the four hybridization patterns, 
wherein genes differentially expressed in said first and second states are identified by 
the presence of differences in the hybridization patterns at pre-defined regions; 

30 f. identifying the oligonucleotide/polynucleotides on each surface 

which correspond to said pattern differences and the corresponding ESTs or larger 
gene fragment from which the oligonucleotide/polynucleotides were obtained, 
whereby identification of the EST or larger gene fragment permits identification of 
the gene from which the ESTs or larger gene fragment were derived. 

35 
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2. The method according to Claim 1 wherein said first and second states are 
respectively healthy and disease; pathogen uninfected and pathogen infected; a first 
progression state and a second progression of a disease or infection; a first treatment 
state and a second treatment state of a disease or infection; or a first developmental 

5 and a second developmental state. 

3. The method according to Claim 1 wherein said organism is a plant or an 

animal. 

10 4. The method according to Qaim 3 wherein said aniaml is a human. 

5. A method for identifying genes which are differentially expressed in a 
normal healthy animal and an animal having a disease comprising: 

a. providing a first surface on which is immobilized at pre- 
15 defined regions on said surface a plurality of defined oligonucleotide^lynucleotide 

sequences, each sequence each sequence selected from the group consisting of a 
fragment of an EST, an entire EST a jfragment of a gene or an entire gene, isolated 
firom a DNA library prepared from at least one selected cell, tissue, organ or organism 
sample in a healthy animal and present in excess relative to the polynucleotide to be 
20 hybridized; 

b. providing a second surface on which is immobilized at pre- 
defined regions of said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence each sequence selected from the group consisting of a 
fiagment of an EST, an entire EST a fragment of a gene or an entire gene, isolated 

25 firom a DNA library prepared firom at least one selected cell, tissue, organ or organism 
sample from an animal having said disease and present in excess relative to the 
polynucleotide to be hybridized; 

c. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 

30 selected from sources analogous to the sources of step (a), said hybridization 
sufficient to form a first and second hybridization pattern on each said first and 
second surface, said sample selected from a cell or tissue sample analogous to the 
sample of step (a), said hybridization sufficient to form a first and second 
hybridization pattern on each said first and second surface; 
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d. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a sample from an animal having said disease, 
said sample selected from a cell or tissue sample analogous to the sample of step (c), 
said hybridization sufficient to form a third and fourth hybridization pattern on each 

5 said first and second surface, 

e. comparing at least two of the four hybridization pattems, 
wherein genes differentially expressed in said first and second states are identified by 
the presence of differences in the hybridization patterns at pre-defined regions; 

f . identifying the oligonucleotide/polynucleotides on each surface 
10 which correspond to said pattern differences and the corresponding ESTs or larger 

gene fragment from which the oligonucleotide/polynucleotides were obtained, 
whereby identification of the EST or larger gene fragment permits identification of 
the gene from which the ESTs or larger gene fragment were derived. 

15 6. A method for identifying genes which are differentially expressed in a 

normal healthy animal and an animal having a disease comprising: 

a. providing a surface on which is immobilized at pre-defined 
regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 

20 an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared from the group selected from at least one selected cell, tissue, organ or 
organism sample in of a healthy animal and an analogous selected sample of an 
animal having said disease and both present in excess relative to the polynucleotide to 
be hybridized; 

25 b. detectably hybridizing to a first copy of said surface 

polynucleotide sequences isolated from a healthy animal, said sample selected from a 
cell or tissue sample analogous to the sample of step (a), said hybridization sufficient 
to form a first hybridization pattern on said surface; 

c. detectably hybridizing to a second copy of said surface 
30 polynucleotide sequences isolated from an animal having said disease, said sample 

selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a second hybridization pattern on said surface; 

d. comparing the two hybridization patterns, wherein genes 
differentially expressed in a disease state are identified by the presence of differences 

35 in the hybridization patterns at pre-defined regions; 
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e. identifying the oligonucleotide/polynucleotides on each surface 
which conespond to said pattern differences and the corresponding ESTs from which 
the oligonucleotide/polynucleotides are obtained, whereby identification of the EST 
permits identification of the gene from which the ESTs were derived. 

5 

7. A method for identifying a gene of a pathogen which is expressed in a 
biological sample of an animal infected with said pathogen comprising: 

a. providing a first surface on which is immobilized at pre- 
defined regions on said surface a plurality of defined oligonucleotide/polynucleotide 
10 sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared from at least one selected cell, tissue, organ or organism sample of a 
healthy, uninfected animal and present in excess relative to the polynucleotide to be 
hybridized; 

15 b, providing a second surface on which is immobilized at pre- 

defined regions of said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fi-agment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated fh>m at least one 
selected cell, tissue, organ or organism sample of an infected animal; 

20 c. detectably hybridizing to a set of said first and second surfaces 

polynucleotide sequences isolated firom a sample from a healthy animal, said sample 
selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form first and second hybridization patterns on each said 
first and second surface, 

25 d. detectably hybridizing to a set of said first and second surfaces 

polynucleotide sequences isolated from a sample from an infected animal, said 
sample selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form third and fourth hybridization patterns on each said 
first and second surface, 

30 e. comparing the four hybridization patterns, wherein genes of 

said pathogen which are expressed in an infected animal are identified by the 
presence of differences in the hybridization patterns at pre-defined regions; 

f. identifying the oligonucleotide/polynucleotides on each surface 
which correspond to said pattern differences and the corresponding ESTs from which 

35 the oligonucleotide/polynucleotides are obtained, whereby identification of the EST 
permits identification of the gene from which the ESTs were derived. 

25 
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8. A method for identifying a gene of a pathogen which is expressed in a 
biological sample of an animal infected with said pathogen comprising: 

^ a. providing a surface on which is immobilized at pre-defined 

regions on said surface a plurality of defined oligonucleotide/polynucleotide 
5 sequences, each sequence selected from the group consisting of a fragment of an EST, 
an entire EST a fragment of a gene or an entire gene isolated from a DNA library 
prepared fix>m the group selected from at least one selected cell, tissue, organ or 
organism sample in of a healthy animal and an analogous selected sample of an 
animal having said disease and both present in excess relative to the polynucleotide to 
10 be hybridized 

b, detectably hybridizing to a first copy of said surface 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 
selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a first hybridization pattern on said surface; 
15 c. detectably hybridizing to a second copy of said surface 

polynucleotide sequences isolated from a sample from an infected animal, said 
sample selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a second hybridization pattem on said surface; 

d. comparing the two hybridization patterns, wherein genes of 
20 said pathogen which are expressed in an infected animal are identified by die 

presence of differences in the hybridization patterns at pre-defined regions; 

e. identifying the oligonucleotide/polynucleotides on each surface 
which correspond to said pattem differences and the corresponding ESTs from which 
the oligonucleotide/polynucleotides are obtained, whereby identification of the EST 

25 permits identification of the gene from which the ESTs were derived. 

9. A composition suitable for use in hybridization comprising a solid 
surface on which is immobilized at pre-defined regions on said surface a plurality of 
defined oligonucleotide/polynucleotide sequences for hybridization, each sequence 

30 selected fix)m the group consisting of a fragment of an EST, an entire EST a fragment 
of a gene or an entiire gene isolated from a DNA library prepared from the group 
selected from at least one selected cell tissue, organ or organism sample of a healthy 
animal, at least one analogous sample of said animal having a disease, at least one 
analogous sample of said animal infected m\h a microbial pathogen, and any 

35 combination thereof. 
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10. An isolated gene sequence which is differentially expressed in a 
normal healthy animal and an animal having a disease, identified by the method of 
claim 1. 

5 11. An isolated pathogen gene sequence which is expressed in tissue or 

cell samples of an infected animal identified by the method of claim 7. 

12. A diagnostic composition useful for the diagnosis of a disease 
comprising a reagent capable of detectably targeting a gene sequence of claim 10 in a 

10 biological sample of an animal. 

13. A diagnostic composition useful for the diagnosis of infection, by a 
pathogen comprising a' reagent capable of detectably targeting a gene sequence of 
claim 1 1 in a biological sample of an animal. 

15 

14. An isolated protein produced by expression of a gene sequence of 
claim 10. 

15. An isolated pathogen protein produced by expression of a gene 
20 sequence of claim 1 1 . 

16. A therapeutic composition comprising a protein or fragment thCTeof 
selected from the group consisting of a protein of claim 10 and a protein of claim 15. 

25 17. A method for diagnosing a selected disease or infection in an animal 

comprising: 

a, providing a first surface on which is immobilized at pre- 
defined regions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence selected from the group consisting of a fragment of an EST, 

30 an entire EST a fragment of a gene or an entire gene, isolated from a DNA library 
prepared from at least one selected cell, tissue, organ or organism sample of a healthy 
animal and present in excess relative to the polynucleotide to be hybridized; 

b. providing a second surface on which is immobilized at pre- 
defined regions of said surface a plurality of defined oligonucleotide/polynucleotide 

35 sequences, each sequence comprising a fragment of an EST isolated from at least one 
said tissue of an animal having said disease; 
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c. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a DNA library prepared from a sample from a 
healthy animal, said sample selected from a cell or tissue sample analogous to the 
sample of step (a), said hybridization sufficient to form a first and second 

5 hybridization pattern on each said first and second surface; 

d. detectably hybridizing to a set of said first and second surfaces 
polynucleotide sequences isolated from a DNA library prepared from a sample from 
an animal having said disease, said sample selected from a cell or tissue sample 
analogous to the sample of step (c), said hybridization sufficient to form a third and 

10 fourth hybridization pattern on each said first and second surface; 

c. comparing the four hybridization pattems, wherein substantial 
differences between the first and third hybridization patterns and die second and 
fourth hybridization panems indicates the presence of said selected disease or 
infection in said animal, and substantial similarities in said first and third 

15 hybridization patterns and second and fourth hybridization patterns indicates the 
absence of disease or infection. 

18. A method for diagnosing a selected disease or infection in an animal 
comprising: 

20 a. providing a surface on which is immobilized at pre-defined 

legions on said surface a plurality of defined oligonucleotide/polynucleotide 
sequences, each sequence comprising a fragment of an EST isolated from a DNA 
library prepared from the group consisting of a selected cell or tissue sample of a 
healthy aninud and an analogous selected cell or tissue sample of an animial having 

25 said disease; 

b. detectably hybridizing to a first copy of said surface 
polynucleotide sequences isolated from a sample from a healthy animal, said sample 
selected from a cell or tissue sample analogous to the sample of step (a), said 
hybridization sufficient to form a first hybridization pattern on said surface; 

30 c. detectably hybridizing to a second copy of said surface 

polynucleotide sequences isolated from a DNA library prepared from a sample from 
an animal having said disease, said sample selected from a cell or tissue sample 
analogous to the sample of step (a), said hybridization sufficient to form a second 
hybridization pattern on said surface; 

35 d. comparing the two hybridization patterns, wherein substantial 

differences between the first and second hybridization patterns indicates the presence 
of said selected disease or infection in said animal, and substantial similarities in said 
first and second hybridization patterns indicates the absence of disease or infection. 
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COMPARATIVE GENE TRANSCRIPT ANALYSIS 

1. FIELD OF INVENTION 
The present invention is in the field of molecular 
biology and computer science; more particularly, the 
present invention describes methods of analyzing gene 
transcripts and diagnosing the genetic expression of cells 
and tissue. 

2. BACKGROUND OF THE INVENTION 
Until very recently, the history of molecular biology 
has been written one gene at a time. Scientists have 
observed the cell's physical changes, isolated mixtures 
from the cell or its milieu, purified proteins, sequenced 
proteins and therefrom constructed probes to look for the 
. corresponding gene. 
15 Recently, different nations have set up massive 

projects to sequence the billions of bases in the human 
genome. These projects typically begin with dividing the 
genome into large portions of chromosomes and then 
determining the sequences of these pieces, which are then 
20 analyzed for identity with known proteins or portions 

thereof, known as motifs. Unfortunately, the majority of 
genomic DNA does not encode proteins and though it is 
postulated to have some effect on the cell's ability to 
make protein, its relevance to medical applications is not 
25 understood at this time. 

A third methodology involves sequencing only the 
transcripts encoding the cellular machinery actively 
involved in making protein, namely the mRNA. The advantage 
is that the cell has already edited out all the non-coding 
30 DNA, and it is relatively easy to identify the protein- 
coding portion of the RNA. The utility of this approach 
was not immediately obvious to genomic researchers, in 
fact, when cDNA sequencing was initially proposed, the 
method was roundly denounced by those committed to genomic 
35 sequencing. For example, the head of the U.S. Human Genome 
project discounted CDNA sequencing as not valuable and 
refused to approve funding of projects. 

In this disclosure, we teach methods for analyzing 
DNA, including cDNA libraries. Based on our analyses and 
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research, we see each individual gene product as a "pixel" 
of information, which relates to the expression of that, 
and only that, gene. We teach herein, methods whereby the 
individual "pixels" of gene expression information can be 
5 combined into a single gene transcript "image," in which 
each of the individual genes can be visualized 
simultaneously and allowing relationships between the gene 
pixels to be easily visualized and understood. 

We further teach a new method which we call electronic 
10 subtraction. Electronic subtraction will enable the gene 
researcher to turn a single image into a moving picture, 
one which describes the temporality or dynamics of. gene 
expression, at the level of a cell or a whole tissue, it 
is that sense of "motion" of cellular machinery on the 
15 scale of a cell or organ which constitutes the new 

invention herein. This constitutes a new view into the 
process of living cell physiology and one which holds great 
promise to unveil and discover new therapeutic and 
diagnostic approaches in medicine. 

We teach another method which we call "electronic 
northern," which tracks the expression of a single gene 
across many types of cells and tissues. 

Nucleic acids (DNA and RNA) carry within their 
sequence the hereditary information and are therefore the 
25 prime molecules of life. Nucleic acids are found in all 
living organisms including bacteria, fungi, viruses, plants 
and animals, it is of interest to determine the relative 
abundance of different discrete nucleic acids in different 
cells, tissues and organisms over time under various 
30 conditions, treatments and regimes. 

All dividing cells in the human body contain the same 
set of 23 pairs of chromosomes. it is estimated that these 
autosomal and sex chromosomes encode approximately 100,000 
genes. The differences among different types of cells are 
35 believed to reflect the differential expression of the 
100,000 or so genes. Fundamental questions of biology 
could be answered by understanding which genes are 
transcribed and knowing the relative abundance of 
transcripts in different cells. 
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Previously, the art has only provided for the analysis 
of a few known genes at a time by standard molecular 
biology techniques such as PGR, northern blot analysis, or 
other types of DNA probe analysis such as in situ 
5 hybridization • Each of these methods allows one to analyze 
the transcription of only known genes and/ or small numbers 
of genes at a time. Nucl. Acids Res. 19, 7097-7104 (1991) ; 
Nucl. Acids Res. 18, 4833-42 (1990); Nucl. Acids Res. 18, 
2789-92 (1989); European J. Neuroscience 2, 1063-1073 
10 (1990); Analytical Biochem. 187 . 364-73 (1990); Genet. 
Annals Techn. Appl. 7, 64-70 (1990); GATA 8(4), 129-33 
(1991); Proc. Natl. Acad. Sci. USA 85, 1696-1700 (1988); 
Nucl. Acids Res. 19, 1954 (1991); Proc. Natl. Acad. Sci. 
USA 88, 1943-47 (1991); Nucl . Acids Res. 19, 6123-27 
15 (1991); Proc. Natl. Acad. Sci. USA 85, 5738-42 (1988); 
Nucl. Acids Res. 16, 10937 (1988) . 

Studies of the number and types of genes whose 
transcription is induced or otherwise regulated during cell 
processes such as activation, differentiation, aging, viral 
20 transformation, morphogenesis, and mitosis have been 

pursued for many years, using a variety of methodologies. 
One of the earliest methods was to isolate and analyze 
levels of the proteins in a cell, tissue, organ system, or 
even organisms both before and after the process of 
25 interest. One method of analyzing multiple proteins in a 
sample is using 2-dimensional gel electrophoresis, wherein 
proteins can be, in principle, identified and quantified as 
individual bands, and ultimately reduced to a discrete 
signal. At present, 2-dimensional analysis only resolves 
30 approximately 15% of the proteins. In order to positively 
analyze those bands which are resolved, each band must be 
excised from the membrane and subjected to protein sequence 
analysis using Edman degradation. Unfortunately, most of 
the bands were present in quantities too small to obtain a 
35 reliable sequence, and many of those bands contained more 
than one discrete protein. An additional difficulty is 
that many of the proteins were blocked, at the 
amino-terminus , further complicating the sequencing - 
process. 
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Analyzing differentiation at the gene transcription 
level has overcome many of these disadvantages and 
drawbacks, since the power of recombinant DNA technology 
allows amplification of signals containing very small 
5 amounts of material. The most common method, called 
"hybridization subtraction," involves isolation of mRNA 
from the biological specimen before (B) and after (A) the 
developmental process of interest, transcribing one set of 
mRNA into cDNA, subtracting specimen B from specimen A 
(mRNA from cDNA) by hybridization, and constructing a cDNA 
library from the non-hybridizing mRNA fraction. Many 
different groups have used this strategy successfully, and 
a variety of procedures have been published and improved 
upon using this same basic scheme. Nucl. Acids Res. 19, 
15 7097-7104 (1991); Nucl. Acids Res. 18, 4833-42 (1990); 
• Nucl. Acids Res. 18, 2789-92 (1989); European J. 
Neuroscience 1, 1063-1073 (1990); Analytical Biochem. 187, 
364-73 (1990); Genet. Annals Techn. Appl. 2, 64-70 (1990); 
GATAa(4), 129-33 (1991); Proc. Natl. Acad. Sci. USA 
20 1696-1700 (1988); Nucl. Acids Res. 19, 1954 (1991); Proc. 
Natl. Acad. Sci. USA 88, 1943-47 (1991); Nucl. Acids Res! 
19, 6123-27 (1991); Proc. Natl. Acad. Sci. USA 85, 5738-42 
(1988); Nucl. Acids Res., 16, 10937 (1988). 

Although each of these techniques have particular 
strengths and weaknesses, there are still some limitations 
and undesirable aspects of these methods: First, the time 
and effort required to construct such libraries is quite 
large. Typically, a trained molecular biologist might 
expect construction and characterization of such a library 
to require 3 to 6 months, depending on the level of skill, 
experience, and luck. Second, the resulting subtraction ' 
libraries are typically inferior to the libraries 
constructed by standard methodology, a typical 
conventional cDNA library should have a clone complexity of 
at least 10« clones, and an average insert size of 1-3 kB. 
In contrast, subtracted libraries can have complexities of 
10* or 10» and average insert sizes of 0.2 kB. Therefore, 
there can be a significant loss of clone and sequence 
information associated with such libraries. Third, this 
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approach allows the researcher to capture only the genes 
induced in specimen A relative to specimen B, not 
vice-versa, nor does it easily allow comparison to a third 
specimen of interest (C) . Fourth, this approach requires 
5 very large amounts (hundreds of micrograms) of "driver" 
mRNA (specimen B) , which significantly limits the number 
and type of subtractions that are possible since many 
tissues and cells are very difficult to obtain in large 
quantities. 

10 Fifth, the resolution of the subtraction is dependent 

upon the physical properties of DNA:DNA or RNArDNA 
hybridization. The ability of a given sequence to find a 
hybridization match is dependent on its unique CoT value. 
The CoT value is a function of the number of copies 

15 (concentration) of the particular sequence, multiplied by 
the time of hybridization. It follows that for sequences 
which are abundant, hybridization events will occur very 
rapidly (low CoT value), while rare sequences will form 
duplexes at very high CoT values. CoT values which allow 

20 such rare sequences to form duplexes and therefore be 
effectively selected are difficult to achieve in a 
convenient time frame. Therefore, hybridization 
subtraction is simply not a useful technique with which to 
study relative levels of rare mRNA species. Sixth, this 

25 problem is further complicated by the fact that duplex 
formation is also dependent on the nucleotide base 
composition for a given sequence. Those sequences rich in 
G + C form stronger duplexes than those with high contents 
of A + T. Therefore, the former sequences will tend to be 

30 removed selectively by hybridization subtraction. Seventh, 
it is possible that hybridization between nonexact matches 
can occur. When this happens, the expression of a 
homologous gene may "mask" expression of a gene of 
interest, artificially skewing the results for that 

35 particular gene. 

Matsubara and Okubo proposed using partial cDNA 
sequences to establish expression profiles of genes which 
could be used in functional analyses of the human genome. 
Matsubara and Okubo warned against using random priming, as 
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it creates multiple unique DNA fragments from individual 
mRNAs and may thus skew the analysis of the number of 
particular mRNAs per library. They sequenced randomly 
selected members from a 3 '-directed cDNA library and 
5 established the frequency of appearance of the various 
ESTs. They proposed comparing lists of esTs from various 
cell types to classify genes. Genes expressed in many 
different cell types were labeled housekeepers and those 
selectively expressed in certain cells were labeled cell- 
10 specific genes, even in the absence of the full sequence of 
the gene or the biological activity of the gene product. 

The present invention avoids the drawbacks of the 
prior art by providing a method to quantify the relative 
abundance of multiple gene transcripts in a given 
biological specimen by the use of high-throughput 
sequence-specific analysis of individual RNAs and/or their 
corresponding cDNAs. 

The present invention, of fers several advantages over 
current protein discovery methods which attempt to isolate 
individual proteins based upon biological effects. The 
method of the instant invention provides for detailed 
diagnostic comparisons of cell profiles revealing numerous 
changes in the expression of individual transcripts. • 

The instant invention provides several advantages over 
current subtraction methods including a more complex 
library analysis (io« to 10^ clones as compared to 10^ 
clones) which allows identification of low abundance 
messages as well as enabling the identification of messages 
which either increase or decrease in abundance. These 
large libraries are very routine to make in contrast to the 
libraries of previous methods, m addition, homologues can 
easily be distinguished with the method of the instant 
invention. 

This method is very convenient because it organizes a 
large quantity of data into a comprehensible, digestible 
format. The most significant differences are highlighted 
by electronic subtraction, m depth analyses are made more 
convenient. 
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The pres nt invention provides several advantages over 
previous methods of electronic analysis of cDNA. The 
method is particularly powerful when more than 100 and 
preferably more than 1,000 gene transcripts are analyzed. 
In such a case, new low-frequency transcripts are 
discovered and tissue typed - 

High resolution analysis of gene expression can be 
used directly as a diagnostic profile or to identify 
disease-specific genes for the development of more classic 
diagnostic approaches. 

This process is defined as gene transcript frequency 
analysis. The resulting quantitative analysis of the gene 
transcripts is defined as comparative gene transcript 
analysis. 

15 3. SUMMTOIY OF THE INVENTION 

The invention is a method of analyzing a specimen 
containing gene transcripts comprising the steps of (a) 
producing a library of biological sequences; (b) generating 
a set of transcript sequences, where each of the transcript 

20 sequences in said set is indicative of a different one of 
the biological sequences of the library; (c) processing the 
transcript sequences in a programmed computer (in which a 
database of reference transcript sequences indicative of 
reference sequences is stored) , to generate an identified 

25 sequence value for each of the transcript sequences, where 
each said identified sequence value is indicative of 
sequence annotation and a degree of match between one of 
the biological sequences of the library and at least one of 
the reference sequences; and (d) processing each said 

30 identified sequence value to generate final data valueS: 

indicative of the number of times each identified sequence 
value is present in the library. 

The invention also includes a method of comparing two 
.specimens containing gene transcripts. The first specimen 

35 is processed as described above. The second specimen is 
used to produce a second library of biological sequences, 
which is used to generate a second set of transcript 
sequences, where each of the transcript sequences in the 
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second set is indicative of one of the biological sequences 
of the second library. Then the second set of transcript 
sequences is processed in a programmed computer to generate 
a second set of identified sequence values, namely the 
5 further identified sequence values, each of which is 

indicative of a sequence annotation and includes a degree 
of match between one of the biological sequences of the 
second library and at least one of the reference sequences. 
The further identified sequence values are processed to 
10 generate further final data values indicative of the number 
of times each further identified sequence value is present 
in the second library. The final data values from the 
first specimen and the further identified sequence values 
from the second specimen are processed to generate ratios 
15 of transcript sequences, which indicate the differences in 
the number of gene transcripts between the two specimens. 

In a further embodiment, the method includes 
quantifying the relative abundance of mRNA in a biological 
specimen by (a) isolating a population of mRNA transcripts 
20 from a biological specimen; (b) identifying genes from 
which the mRNA was transcribed by a sequence-specific 
method; (c) determining the numbers of mRNA transcripts 
corresponding to each of the genes; and (d) using the mRNA 
transcript numbers to determine the relative abundance of 
25 mRNA transcripts within the population of mRNA transcripts. 
Also disclosed is a method of producing a gene 
transcript image analysis by first obtaining a mixture of 
itiRNA, from which cDNA copies are made. The cDNA is 
inserted into a suitable vector which is used to transfect 
3 0 suitable host strain cells which are plated out and 

permitted to grow into clones, each cone representing a 
unique mRNA. A representative population of clones 
transfected with cDNA is isolated. Each clone in the 
population is identified by a sequence-specific method 
35 which identifies the gene from which the unique mRNA was 
transcribed. The number of times each gene is identified 
to a clone is determined to evaluate gene transcript 
abundance. The genes and their abundances are listed in 
order of abundance to produce a gene transcript image. 
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In a further embodiment, the relative abundance of the 
gene transcripts in one cell type or tissue is compared 
with the relative abundance of gene transcript numbers in a 
second cell type or tissue in order to identify the 
5 differences and similarities. 

In a further embodiment, the method includes a system 
for analyzing a library of biological sequences including a 
means for receiving a set of transcript sequences, where 
each of the transcript sequences is indicative of a 

10 different one of the biological sequences of the library; 
and a means for processing the transcript sequences in a 
computer system in which a database of reference transcript 
sequences indicative of reference sequences is stored, 
wherein the computer is programmed with software for 

15 generating an identified sequence value for each of the 
transcript sequences, where each said identified sequence 
value is indicative of a sequence annotation and the degree 
of match between a different one of the biological 
sequences of the library and at least one of the reference 

20 sequences, and for processing each said identified sequence 
value to generate final data values indicative of the 
number of times each identified sequence value is present 
in the library. 

In essence, the invention is a method and system for 

25 quantifying the relative abundance of gene transcripts in a 
biological specimen. The invention provides a method for 
comparing the gene transcript image from two or more 
different biological specimens in order to distinguish 
between the two specimens and identify one or more genes 

30 which are differentially expressed between the two 
specimens. Thus, this gene transcript image and its 
comparison can be used as a diagnostic. One embodiment of 
the method generates high-throughput sequence-specific 
analysis of multiple RNAs or their corresponding cDNAs: a 

35 gene transcript image. Another embodiment of the method 

produces the gene transcript imaging analysis by the use of 
high-throughput cDNA sequence analysis. In addition, two 
or more gene transcript images can be compared and used to 
detect or diagnose a particular biological state, disease. 
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or condition which is correlated to the relative abundance 
of gene transcripts in a given cell or population of cells. 

4. DESCRIPTION OF THE TABLES AND DRAWINGS 
4.1. TABLES 

5 Table 1 presents a detailed explanation of the letter 

codes utilized in Tables 2-5. 

Table 2 lists the one hundred most common gene 
transcripts. It is a partial list of isolates from the 
HUVEC cDNA library prepared and sequenced as described 
10 below. The left-hand column refers to the sequence's order 
of abundance in this table. The next column labeled 
"number" is the clone number of the first HUVEC sequence 
identification reference matching the sequence in the 
"entry" column number. Isolates that have not been 
15 sequenced are not present in Table 2. The next column, 

labeled "N", indicates the total number of cDNAs which have 
the same degree of match with the sequence of the reference 
transcript in the "entry" column. 

The column labeled "entry" gives the NIH GENBANK locus 
20 name, which corresponds to the library sequence numbers. 
The "s" column indicates in a few cases the species of the 
reference sequence. The code for column "s" is given in 
Table 1. The column labeled "descriptor" provides a plain 
English explanation of the identity of the sequence ^ 
25 corresponding to the NIH GENBANK locus name in the "entry" 
column. 

Table 3 is a comparison of the top fifteen most 
abundant gene transcripts in normal monocytes and activated 
macrophage cells. 

30 'Table 4 is a detailed summary of library subtraction 

analysis summary comparing the THP-l and human macrophage 
CDNA sequences- In Table 4, the same code as in Table 2 is 
used. Additional columns are for "bgfreq" (abundance 
number in the subtract ant library) , "rfend" (abundance 

35 number in the target library) and "ratio" (the target 
abundance number divided by the subtractant abundance 
number) . As is clear from perusal of the table, when the 
abundance number in the subtractant library is "0", the 
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target abundance number is divided by 0.05. This is a way 
of obtaining a result (not possible dividing by 0) and 
distinguishing the result from ratios of subtractant 
numbers of 1, 

5 yable 5 is the computer program, written in source 

code, for generating gene transcript subtraction profiles. 

'^^^^Q ^ is a partial listing of database entries used 
in the electronic northern blot analysis as provided by the 
present invention. 

10 

4.2. BRIEF DE SCRIPTION OF THE DRAWINGS 
Figure 1 is a chart summarizing data collected and 
stored regarding the library construction portion of 
sequence preparation and analysis. 

Figure 2 is a diagram representing the sequence of 
operations performed by "abundance sort" software in a 
class of preferred embodiments of the inventive method. 

Figure 3 is a block diagram of a preferred embodiment 
of the system of the invention. 
2° Figure 4 is a more detailed block diagram of the 

bioinformatics process from new sequence (that has already 
been sequenced but not identified) to printout of the 
transcript imaging analysis and the provision of database 
subscriptions. 

25 . S. DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides a method to compare the 
relative abundance of gene transcripts in different 
biological specimens by the use of high-throughput 
sequence-specific analysis of individual RNAs or their 
corresponding cDNAs (or alternatively, of data representing 
other biological sequences) . This process is denoted 
herein as gene transcript imaging. The quantitative 
analysis of the relative abundance for a set of gene 
transcripts is denoted herein as "gene transcript image 
35 analysis" or "gene transcript frequency analysis". The 
present invention allows one to obtain a profile for gene 
transcription in any given population of cells or tissue 
from any type of organism. The invention can be applied to 
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obtain a profile of a specimen consisting of a single cell 
(or clones of a single cell) , or of many cells, or of 
tissue more complex than a single cell and containing 
multiple cell types, such as liver. 
5 The invention has significant advantages in the fields 

of diagnostics, toxicology and pharmacology, to name a few. 
A highly sophisticated diagnostic test can be performed on 
the ill patient in whom a diagnosis has not been made, A 
biological specimen consisting of the patient's fluids or 
10 tissues is obtained, and the gene transcripts are isolated 
and expanded to the extent necessary to determine their 
identity. Optionally, the gene transcripts can be 
converted to cDNA. A sampling of the gene transcripts are 
subjected to sequence-specific analysis and quantified. 
15 These gene transcript sequence abundances are compared 
against reference database sequence abundances including 
normal data sets for diseased and healthy patients. The 
patient has the disease (s) with which the patient's data 
set most closely correlates. 
2b For example, gene transcript frequency analysis can be 

used to differentiate normal cells or tissues from diseased 
cells or tissues, just as it highlights differences between 
normal monocytes and activated macrophages in Table 3. 

In toxicology, a fundamental question is which tests 
25 are most effective in predicting or detecting a toxic 

effect. Gene transcript imaging provides highly detailed 
information on the cell and tissue environment, some of 
which would not be obvious in conventional, less detailed 
screening methods. The gene transcript image is a more 
30 powerful method to predict drug toxicity and efficacy. 
Similar benefits accrue in the use of this tool in 
pharmacology. The gene transcript image can be used 
selectively to look at protein categories which are 
expected to be affected, for example, enzymes which 
35 detoxify toxins. 

In an alternative embodiment, comparative gene 
transcript frequency analysis is used to differentiate 
between cancer cells which respond to anti-cancer agents 
and those which do not respond. Examples of anti-cancer 
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agents are tamoxifen, vincristine, vinblastine, 
podophyllotoxins, etoposide, tenisposide, cisplatin, 
biologic response modifiers such as interferon, 11-2, GM- 
CSF, enzymes, hormones and the like. This method also 
5 provides a means for sorting the gene transcripts by 
functional category. In the case of cancer cells, 
transcription factors or other essential regulatory 
molecules are very important categories to analyze across 
different libraries. 

10 In yet another embodiment, comparative gene transcript 

frequency analysis is used to differentiate between control 
liver cells and liver cells isolated from patients treated 
with experimental drugs like FIAU to distinguish between 
pathology caused by the underlying disease and that caused 

15 by the drug. 

In yet another embodiment, comparative gene transcript 
frequency analysis is used to differentiate between brain 
tissue from patients treated and untreated with lithium. 
In a further embodiment, comparative gene transcript 
20 frequency analysis is used to differentiate between 
cyclosporin and FK506-treated cells and normal cells. 

In a further embodiment, comparative gene transcript 
frequency analysis is used to differentiate between virally 
infected (including HIV-infected) human cells and 
25 uninfected human cells. Gene transcript frequency analysis 
is also used to rapidly survey gene transcripts in HIV- 
resistant, HIV-infected, and HIV-sensitive cells. 
Comparison of gene transcript abundance will indicate the 
success of treatment and/ or new avenues to study. 
30 In a further embodiment, comparative gene transcript 

frequency analysis is used to differentiate between 
bronchial lavage fluids from healthy and unhealthy patients 
with a variety of ailments. 

In a further embodiment, comparative gene transcript 
35 frequency analysis is used to differentiate between cell, 
plant, microbial and animal mutants and wild-type species. 
In addition, the transcript abundance program is adapted to 
permit the scientist to evaluate the transcription of one 
gene in many different tissues, such comparisons could 
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identify deletion mutants which do not produce a gene 
product and point mutants which produce a less abundant or 
otherwise different message. Such mutations can affect 
basic biochemical and pharmacological processes, such as 
5 mineral nutrition and metabolism, and can be isolated by 
means known to those skilled in the art. Thus, crops with 
improved yields, pest resistance and other factors can be 
developed. 

In a further embodiment, comparative gene transcript 
10 frequency analysis is used for an interspecies comparative 
analysis which would allow for the selection of better 
pharmacologic animal models. In this embodiment, humans 
and other animals (such as a mouse) , or their cultured 
cells are treated with a specific test agent. The relative 
15 sequence abundance of each cDNA population is determined. 
If the animal test system is a good model, homologous genes 
in the animal cDNA population should change expression 
similarly to those in human cells. if side effects are 
detected with the drug,' a detailed transcript abundance 
20 analysis will be performed to survey gene transcript 

changes. Models will then be evaluated by comparing basic 
physiological changes. 

In a further embodiment, comparative gene transcript 
frequency analysis is used in a clinical setting to give a 
25 highly detailed gene transcript profile of a patient's 
cells or tissue (for example, a blood sample) . m 
particular, gene transcript frequency analysis is used to 
give a high resolution gene expression profile of a 
diseased state or condition. 
30 In the preferred embodiment, the method utilizes 

high-throughput cDNA sequencing to identify specific 
transcripts of interest. The generated cDNA and deduced 
amino acid sequences are then extensively compared with 
GENBANK and other sequence data banks as described below. 
35 The method offers several advantages over current protein 
discovery by two-dimensional gel methods which try to 
identify individual proteins involved in a particular 
biological effect. Here, detailed comparisons of profiles 
of activated and inactive cells reveal numerous changes in 
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the expression of individual transcripts. After it is 
determined if the sequence is an "exact" match, similar or 
a non-match, the sequence is ntered into a database. 
Next, the numbers of copies of cDNA corresponding to each 
5 gene are tabulated. Although this can be done slowly and 
arduously, if at all, by human hand from a printout of all 
entries, a computer program is a useful and rapid way to 
tabulate this information. The numbers of cDNA copies 
(optionally divided by the total number of sequences in the 

10 data set) provides a picture of the relative abundance of 
transcripts for each corresponding gene. The list of 
represented genes can then be sorted by abundance in the 
cDNA population. A multitude of additional types of 
comparisons or dimensions are possible and are exemplified 

15 below. 

An alternate method of producing a gene transcript 
image includes the steps of obtaining a mixture of test 
mRNA and providing a representative array of unique probes 
whose sequences are complementary to at least some of the 

20 test mRNAs. Next, a fixed amount of the test mRNA is added 
to the arrayed probes. The test mRNA is incubated with the 
probes for a sufficient time to allow hybrids of the test 
' mRNA and probes to form. The mRNA-probe hybrids are 
detected and the quantity determined. The hybrids are 

25 identified by their location in the probe array. The 
quantity of each hybrid is summed to give a population 
number. Each hybrid quantity is divided by the population 
number to provide a set of relative abundance data termed a 
gene transcript image analysis. 

30 6. EXAMPLES 

The examples below are provided to illustrate the 
subject invention. These examples are provided by way of 
illustration and are not included for the purpose of 
limiting the invention. 

35 6.1. TISSUE SOURCES AND CELL LINES 

For analysis with the computer program claimed herein, 
biological sequences can be obtained from virtually any 
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source. Most popular are tissues obtained from the human 
body. Tissues can be obtained from any organ of the body, 
any age donor, any abnormality or any immortalized cell 
lin . Immortal cell lines may be preferred in some 
5 instances because of their purity of cell type; other 
tissue samples invariably include mixed cell types. A 
special technique is available to take a single cell (for 
example, a brain cell) and harness the cellular machinery 
to grow up sufficient cDNA for sequencing by the techniques 

10 and analysis described herein (cf. U.S. Patent Nos. 
5,021,335 and 5,168,038, which are incorporated by 
reference) . The examples given herein utilized the 
following immortalized cell lines: monocyte-like U-937 
cells, activated macrophage-like THP-1 cells, induced 

15 vascular endothelial cells (HUVEC cells) and mast cell-like 
HMC-l cells. 

The U-937 cell line is a human histiocytic lymphoma 
cell line with monocyte characteristics, established . from 
malignant cells obtained from the pleural effusion of a 

20 patient with diffuse histiocytic lymphoma (Sundstrom, C. 
and Nilsson, K. (1976) Int. J. Cancer 17:565). U-937 is 
one of only a few human cell lines with the morphology, 
cytochemistry, surface receptors and monocyte-like 
characteristics of histiocytic cells. These cells can be 

25 induced to terminal monocytic differentiation and will 
express new cell surface molecules when activated with 
supernatants from human mixed lymphocyte cultures. Upon 
this type of in vitro activation, the cells undergo 
morphological and functional changes, including 

30 augmentation of antibody-dependent cellular cytotoxicity 

(ADCC) against erythroid and tumor target cells (one of the ' 
principal functions of macrophages). Activation of U-937 
cells with phorbol 12-myristate 13-acetate (PMA) in vitro 
stimulates the production of several compounds, including 

J5 prostaglandins, leukotrienes and platelet-activating factor 
(PAF) , which are potent inflammatory mediators. Thus, U- 
937 is a cell line that is well suited for the 
identification and isolation of gene transcripts associated 
with normal monocytes. 
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The HUVEC cell line is a normal, homogeneous, well 
characterized, early passage endothelial cell culture from 
human umbilical vein (Cell Systems Corp., 12815 NE 124th 
Street, Kirkland, WA 98034) . Only gene transcripts from 
5 induced, or treated, HUVEC cells were sequenced. One batch 
of 1 X 10* cells was treated for 5 hours with 1 U/ml rIL-lb 
and 100 ng/ml E.coli lipopolysaccharide (LPS) endotoxin 
prior to harvesting. A separate batch of 2 X 10* cells was 
treated at confluence with 4 U/ml TNF and 2 U/ml 

10 inter feron-gamma (IFN-gamma) prior to harvesting. 

THP-l is a human leukemic cell line with distinct 
monocytic characteristics. This cell line was derived from 
the blood of a 1-year-old boy with acute monocytic leukemia 
(Tsuchiya, S. et al. (1980) Int. J. Cancer: 171-76). The 

15 following cytological and cytochemical criteria were used 
to determine the monocytic nature of the cell line: 1) the 
presence of alpha-naphthyl butyrate esterase activity which 
could be inhibited by sodium fluoride; 2) the production of 
lysozyme; 3) the phagocytosis of latex particles and 

20 sensitized SRBC. (sheep red blood cells) ; and 4) the ability 
of mitomycin C-treated THP-l cells to activate T- 
lymphocytes following ConA (concanavalin A) treatment. 
Morphologically, the cytoplasm contained small azurophilic 
granules and the nucleus was indented and irregularly 

25 shaped with deep folds. The cell line had Fc and C3b 
receptors, probably functioning in phagocytosis. THP-l 
cells treated with the tumor promoter 12-o-tetradecanoyl- 
phorbol-13 acetate (TPA) stop proliferating and 
differentiate into macrophage-like cells which mimic native 

30 raonocyte-derived macrophages in several respects. 

Morphologically, as the cells change shape, the nucleus 
becomes more irregular and additional phagocytic vacuoles 
appear in the cytoplasm. The differentiated THP-l cells 
also exhibit an increased adherence to tissue culture 

35 plastic. 

HMC-1 cells (a human mast cell line) were established 
from the peripheral blood of a Mayo Clinic patient with 
mast cell leukemia (Leukemia Res. (1988) 12 :345-55) . The 
cultured cells looked similar to immature cloned murine 
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mast cells, contained histamine, and stained positively for 
chloroacetate esterase, amino caproate esterase, eosinophil 
major basic protein (MBP) and tryptase. The HMC-l cells 
have, however, lost the ability to synthesize normal igE 
5 receptors. HMC-l cells also possess a 10; 16 translocation, 
present in cells initially collected by leukophoresis from 
the patient and not an artifact of culturing. Thus, HMC-l 
cells are a good model for mast cells. 

6»2. CONSTRUCTION OF cDNA LIBRARIES 
10 For inter-library comparisons, the libraries must be 

prepared in similar manners. Certain parameters appear to 
be particularly important to control. One such parameter 
is the method of isolating mRNA. It is important to use 
the same conditions to remove DNA and heterogeneous nuclear 
15 RNA from comparison libraries. size fractionation of cDNA 
must be carefully controlled. The same vector preferably 
should be used for preparing libraries to be compared. At 
the very least, the same type of vector (e.g., 
unidirectional vector) should be used to assure a valid 
20 comparison. A unidirectional vector may be preferred in 
order to more easily analyze the output. 

It is preferred to prime only with oligo dT 
unidirectional primer in order to obtain one only clone per 
mRNA transcript when obtaining cDNAs. However, it is 
25 recognized that employing a mixture of oligo dT and random 
primers can also be advantageous because such a mixture 
results in more sequence diversity when gene discovery also 
is a goal. Similar effects can be obtained with DR2 
(Clontech) and HXLOX (US Biochemical) and also vectors from 
30 Invitrogen and Novagen. These vectors have two 

requirements. First, there must be primer sites for 
commercially available primers such as T3 or M13 reverse 
primers. Second, the vector must accept inserts up to 10 
kB. 

35 It also is important that the clones be randomly 

sampled, and that a significant population of clones is 
used. Data have been generated with 5,000 clones; however, 
if very rare genes are to be obtained and/or their relative 
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abundance determined, as many as 100,000 clones from a 
single library may need to be sampled. Size fractionation 
of cDNA also must, be carefully controlled. Alternately, 
plaques can be selected, rather than clones. 
5 Besides the Uni-ZAP™ vector system by Stratagene 

disclosed below, it is now believed that other similarly 
unidirectional vectors also can be used. For example, it 
is believed that such vectors include but are not limited 
to DR2 (Clontech) , and HXLOX (U.S. Biochemical). 

10 Preferably, the details of library construction (as 

shown in Figure 1) are collected and stored in a database 
for later retrieval relative to the sequences being 
compared. Fig. l shows important information regarding the 
library collaborator or cell or cDNA supplier, 

15 pretreatment, biological source, culture, mRNA preparation 
• and CDNA construction. Similarly detailed information 
about the other steps is beneficial in analyzing sequences 
and libraries in depth. 

RNA must be harvested from cells and tissue samples 

20 and CDNA libraries are subsequently constructed. cDNA 

libraries can be constructed according to techniques known 
in the art. (See, for example, Maniatis, T. et al. (1982) 
Molecular Cloning, Cold Spring Harbor Laboratory, New 
York) . cDNA libraries may also be purchased. The U-937 

25 CDNA library (catalog No. 937207) was obtained from 

Stratagene, Inc., 11099 M. Torrey Pines Rd., La Jolla, CA 
92037. 

The THP-1 cDNA library was custom constructed by 
Stratagene from THP-l cells cultured 48 hours with 100 nm 

30 TPA and 4 hours with 1 /xg/ml LPS. The human mast cell HMC- 
1 cDNA library was also custom constructed by Stratagene 
from cultured HMC-l cells. The HUVEC cDNA library was 
custom constructed by Stratagene from two batches of 
induced HUVEC cells which were separately processed. 

35 Essentially, all the libraries were prepared in the 

same manner. First, poly(A+)RNA (mRNA) was purified. For 
the U-937 and HMC-l RNA, cDNA synthesis was only primed 
with oligo dT. For the THP-l and HUVEC RNA, cDNA synthesis 
was primed separately with both oligo dT and random 
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' hexamers, and the two cDNA libraries were treated 
separately. Synthetic adaptor oligonucleotides were 
ligated onto cDNA ends enabling its insertion into the Uni- 
Zap™ vector system (Stratagene) , allowing high efficiency. 
5 unidirectional (sense orientation) lambda library 

construction and the convenience of a plasmid system with 
blue-white color selection to detect clones with cDNA 
insertions. Finally, the two libraries were combined into 
a single library by mixing equal numbers of bacteriophage. 

10 The libraries can be screened with either DNA probes 

or antibody probes and the pBluescript® phagemid 
(Stratagene) can be rapidly excised in vivo . The phagemid 
allows the use of a plasmid system for easy insert 
characterization, sequencing, site-directed mutagenesis, 

15 the creation of unidirectional deletions and expression of 
fusion proteins. The custom-constructed library phage 
particles were infected into E. coli host strain XLl-Bluec© 
(Stratagene) , which has a high transformation efficiency, 
increasing the probability of obtaining rare, under- 
20 represented clones in the cDNA library. 

6-3. ISOLATION OF cDNA CLONES 
The phagemid forms of individual cDNA clones were 
obtained by the in vivo excision process, in which the host 
bacterial strain was coinfected with both the lambda 

25 library phage and an fl helper phage. Proteins derived 

from both the library-containing phage and the helper phage 
nicked the lambda DNA, initiated new DNA synthesis from 
defined sequences on the lambda target DNA and created a 
smaller, single stranded circular phagemid DNA molecule 

30 that included all DNA sequences of the pBluescript® plasmid 
and the cDNA insert. The phagemid DNA was secreted from 
the cells and purified, then used to re-infect fresh host 
cells, where the double stranded phagemid DNA was produced. 
Because the phagemid carries the gene for beta-lactamase, 

35 the newly-transformed bacteria are selected on medium 
containing ampicillin. 

Phagemid DNA was purified using the Magic Minipreps™ 
DNA Purification System (Promega catalogue #A7100. Promega 
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Corp., 2800 Woods Hollow Rd., Madison, WI 53711). This 
small-scale process provides a simple and reliable method 
f r lysing the bacterial cells and rapidly isolating 
purified phagemid DNA using a proprietary DNA-binding 
5 resin. The DNA was eluted from the purification resin 
already prepared for DNA sequencing and other analytical 
manipulations . 

Phagemid DNA was also purified using the QIAwell-8 
Plasmid Purification System from QIAGEN® DNA Purification 
10 System (QIAGEN Inc., 9259 Eton Ave., Chattsworth, CA 

91311) . This product line provides a convenient, rapid and 
reliable high-throughput method for lysing the bacterial 
cells and isolating highly purified phagemid DNA using 
QIAGEN anion-exchange resin particles with EMPORE™ membrane 
15 technology from 3M in a multiwell format. The DNA was 

eluted from the purification resin already prepared for DNA 
sequencing and other analytical manipulations. 

An alternate method of purifying phagemid has recently 
become available. It utilizes the Miniprep Kit (Catalog 
20 No. 77468, available from Advanced Genetic Technologies 
Corp., 19212 Orbit Drive, Gaithersburg, Maryland). This 
kit is in the 96-well format and provides enough reagents 
for 960 purifications. Each kit is provided with a 
recommended protocol, which has been employed except for 
25 the following changes. First, the 96 wells are each filled 
with only l ml of sterile terrific broth with carbenicillin 
at 25 mg/L and glycerol at 0.4%. After the wells are 
inoculated, the bacteria are cultured for 24 hours and 
lysed with 60 /xl of lysis buffer. A centrifugation step 
30 (2900 rpm for 5 minutes) is performed before the contents 
of the block are added to the primary filter plate. The 
optional step of adding isopropanol to TRIS buffer is not 
routinely performed. After the last step in the protocol, 
samples are transferred to a Beckman 96-well block for 
35 storage. 

Another new DNA purification system is the WIZARD*™ 
product line which is available from Promega (catalog No. 
A7071) and may be adaptable to the .96-well format. 



21 



wo 95/20681 



PCT/US9S/01160 



6.4. SEQUENCING OP cDNA CLONES 
The cDNA inserts from random isolates of the U-937 and 
THP-1 libraries were sequenced in part. Methods for DNA 
sequencing are well known in the art. Conventional 
5 enzymatic methods employ DNA polymerase Klenow fragment, 
Sequenase™ or Taq polymerase to extend DNA chains from an 
oligonucleotide primer annealed to the DNA template of 
interest. Methods have been developed for the use of both 
single- and double-stranded templates. The chain 
10 termination reaction products are usually electrophoresed 
on urea-acrylamide gels and are detected either by 
autoradiography (for radionuclide-labeled precursors) or by 
fluorescence (for fluorescent-labeled precursors) . Recent 
improvements in mechanized reaction preparation, sequencing 
15 and analysis using the fluorescent detection method have 
permitted expansion in the number of sequences that can be 
determined per day (such as the Applied Biosystems 373 and 
377 DNA sequencer. Catalyst 800). Currently with the 
system as described, read lengths range from 250 to 400 
20 bases and are clone dependent. Read length also varies 
with the length of time the gel is run. In general, the 
shorter runs tend to truncate the sequence. A minimum of 
only about 25 to 50 bases is necessary to establish the 
identification and degree of homology of the sequence. 
25 Gene transcript imaging can be used with any sequence- 
specific method, including, but not limited to 
hybridization, mass spectroscopy, capillary electrophoresis 
and 505 gel electrophoresis. 

6.5. HOMOLOGY SEARCHING OF cDNA CLONE AND 
30 DEDUCED PROTEIN (and Subsequent Steps) 

Using the nucleotide sequences derived from the cDNA 

clones as query sequences (sequences of a Sequence 

Listing) , databases containing previously identified 

sequences are searched for areas of homology (similarity) . 

35 Examples of such databases include Genbank and EMBL. We 

next describe examples of two homology search algorithms 

that can be used, and then describe the subsequent 

computer-implemented steps to be performed in accordance 

with preferred embodiments of the invention. 
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In the following description of the computer- 
implemented steps of the invention, the word "library" 
denotes a set (or population) of biological specimen 
nucleic acid sequences. A "library" can consist of cDNA 
5 sequences, RNA sequences, or the like, which characterize a 
biological specimen. The biological specimen can consist 
of cells of a single human cell type (or can be any of the 
other above-mentioned types of specimens) . We contemplate 
that the sequences in a library have been determined so as 
10 to accurately represent or characterize a biological 

specimen (for example, they can consist of representative 
CDNA sequences from clones of RNA taken from a single human 
cell). 

In the following description of the computer- 
15 implemented steps, of the invention, the expression 

"database" denotes a set of stored data which represent a 
collection of sequences, which in turn represent a 
collection of biological reference materials. For example, 
a database can consist of data representing many stored 
.20 CDNA sequences which are in turn representative of human 
cells infected with various viruses, cells of humans of 
various ages, cells from different mammalian species, and 
so on. 

In preferred embodiments, the invention employs a 
25 computer programmed with software (to be described) for 
performing the following steps: 

(a) processing data indicative of a library of cDNA 
sequences (generated as a result of high-throughput cDNA 
sequencing or other method) to determine whether each 

30 sequence in the library matches a DNA sequence of a 

reference database of DNA sequences (and if so, identifying 
the reference database entry which matches the sequence and 
indicating the degree of match between the reference 
sequence and the library sequence) and assigning an 

35 identified sequence value based on the sequence annotation 
and degree of match to each of the sequences in the 
library; 

(b) for some or all entries of the database, 
tabulating the number of matching identified sequence 
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values in the library (Although this can be done by human 
hand from a printout of all entries, we prefer to perform 
this step using computer software to be described below,)/ 
thereby generating a set of final data values or "abundance 
5 numbers"; and 

(c) if the libraries are different sizes, dividing 
each abundance number by the total number of sequences in 
the library, to obtain a relative abundance number for each 
identified sequence value (i.e., a relative abundance of 
10 each gene transcript) . 

The list of identified sequence values (or genes 
corresponding thereto) can then be sorted by abundance in 
the cDNA population. A multitude of additional types of 
comparisons or dimensions are possible. 
15 For example (to be described below in greater detail), 

steps (a) and (b) can be repeated for two different 
libraries (sometimes referred to as a "target" library and 
a "subtractant" library). Then, for each identified 
sequence value (or gene transcript) , a "ratio" value is 
20 obtained by dividing the abundance number (for that 

identified sequence value) for the target library, by the 
abundance number (for that identified sequence value) for 
the subtractant library. 

In fact, subtraction may be carried out on multiple 
25 libraries. It is possible to add the transcripts from 

several libraries (for example, three) and then to divide 
them by another set of transcripts from multiple libraries 
(again, for example, three). Notation for this operation 
may be abbreviated as (A+B+C) / (D+E+F) , where the capital 
30 letters each indicate an entire library. Optionally the 
abundance numbers of transcripts in the summed libraries 
may be divided by the total sample size before subtraction. 

Unlike standard hybridization technology which permits 
a single subtraction of two libraries, once one has 
35 processed a set or library transcript sequences and stored 
them in the computer, any number of subtractions can be 
performed on the library. For example, by this method, 
ratio values can be obtained by dividing relative abundance 
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values in a first library by corresponding values in a 
second library and vice versa. 

In variations on step (a), the library consists of 
nucleotide sequences derived from cDNA clones. Examples of 
5 databases which can be searched for areas of homology 

(similarity) in step (a) include the commercially available 
databases known as Genbank (NIH) EMBL (European Molecular 
Biology Labs, Germany), and GENESEQ (Intelligenetics, 
Mountain View, California) . 
10 One homology search algorithm which can be used to 

implement step (a) is the algorithm described in the paper 
by D.J. Lipman and W.R. Pearson, entitled "Rapid and 
Sensitive Protein Similarity Searches," Science, 227:1435 
(1985). In this algorithm, the homologous regions are 
15 searched in a two-step manner. In the first step, the 

highest homologous regions are determined by calculating a 
matching score using a homology score table. The parameter 
"Ktup" is used in this step to establish the minimum window 
size to be shifted for comparing two sequences. Ktup also 
20 sets the number of bases that must match to extract the 
highest homologous region among the sequences. In this 
step, no insertions or deletions are applied and the 
homology is displayed as an initial (INIT) value. 

In the second step, the homologous regions are aligned 
25 to obtain the highest matching score by inserting a gap in 
order to add a probable deleted portion. The matching 
score obtained in the first step is recalculated using the 
homology score Table and the insertion score Table to an 
optimized (OPT) value in the final output. 
30 DNA homologies between two sequences can be examined 

graphically using the Harr method of constructing dot 
matrix homology plots (Needleman, S.B. and Wunsch, CO., 
Mom. Biol 48:443 (1970)). This method produces a 
two-dimensional plot which can be useful in determining 
35 regions of homology versus regions of repetition. 

However, in a class of preferred embodiments, step (a) 
is implemented by processing the library data in the 
commercially available computer program known as the 
INHERIT 670 Sequence Analysis System, available from 
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Applied Biosystems Inc. (Foster City, California) , 
including the software known as the Factura software (also 
available from Applied Biosystems Inc.)- The Factura 
program preprocesses each library sequence to "edit out" 
5 portions thereof which are not likely to be of interest, 
such as the vector used to prepare the library. Additional 
sequences which can be edited out or masked (ignored by the 
search tools) include but are not limited to the polyA tail 
and repetitive GAG and CCC sequences. A low-end search' 
10 program can be written to mask out such "low-information" 
sequences, or programs such as BLAST can ignore the low- 
information sequences. 

In the algorithm implemented by the INHERIT 670 
Sequence Analysis System, the Pattern Specification 
15 Language (developed by TRW Inc.) is used to determine 
regions of homology. "There are three parameters that 
determine how INHERIT analysis runs sequence comparisons: 
window size, window offset and error tolerance. Window 
size specifies the length of the segments into which the 
20 query sequence is subdivided. Window offset specifies 

where to start the next segment [to be compared], counting 
from the beginning of the previous segment- Error 
tolerance specifies the total number of insertions, 
deletions and/or substitutions that are tolerated over the 
25 . specified word length. Error tolerance may be set to any 
integer between 0 and 6. The default settings are window 
tolerance=20, window offset=l0 and error tolerance=3 . " 
INHERIT Analysis Users Manual , pp. 2-151 Version 1.0, 
Applied Biosystems, Inc., October 1991. 
30 Using a combination of these three parameters, a 

database (such as a DNA database) can be searched for 
sequences containing regions of homology and the 
appropriate sequences are scored with an initial value. 
Subsequently, these homologous regions are examined using 
35 dot matrix homology plots to determine regions of homology 
versus regions of repetition. Smith-Waterman alignments 
can be used to display the results of the homology search. 
The INHERIT software can be executed by a Sun computer 
system programmed with the UNIX operating system. 
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Search alternatives to INHERIT include the BLAST 
program, GCG (available from the Genetics Computer Group, 
WI) and the Dasher program (Temple Smith, Boston 
University, Boston, MA) . Nucleotide sequences can be 
5 searched against Genbank, EMBL or custom databases such as 
GENESEQ (available from Intelligenetics , Mountain View, CA) 
or other databases for genes. In addition, we have 
searched some sequences against our own in-house database. 
In preferred embodiments, the transcript sequences are 
10 analyzed by the INHERIT software for best conformance with 
a reference gene transcript to assign a sequence identifier 
and assigned the degree of homology, which together are the 
identified sequence value and are input into, and further 
processed by, a Macintosh personal computer (available from 
15 Apple) programmed with an "abundance sort and subtraction 
analysis" computer program (to be described below). 

Prior to the abundance sort and subtraction analysis 
program . (also denoted as the "abundance sort" program), 
identified sequences from the cDNA clones are assigned 
20 value (according to the parameters given above) by degree 
of match according to the following categories: "exact" 
matches (regions with a high degree of identity) , 
homologous human matches (regions of high similarity , but 
hot "exact" matches), homologous non-human matches (regions 
25 of high similarity present in species other than human) , or 
non matches (no significant regions of homology to 
previously identified nucleotide sequences stored in the 
form of the database) . Alternately, the degree of match 
can be a numeric value as described below. 
30 With reference again to the step of identifying 

matches between reference sequences and database entries, 
protein and peptide sequences can be deduced from the 
nucleic acid sequences. Using the deduced polypeptide 
sequence, the match identification can be performed in a 
35 manner analogous to that done with cDNA sequences. A 

protein sequence is used as a query sequence and compared 
to the previously identified sequences contained in a 
database such as the Swiss/Prot, PIR and the NBRF Protein 
database to find homologous proteins. These proteins are 
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initially scored for homology using a homology score Table 
(Orcutt, B.C. and Dayoff, M.O. Scoring Matrices, PIR 
Report MAT - 0285 (February 1985)) resulting in an INIT 
score. The homologous regions are aligned to obtain the 
5 highest matching scores by inserting a gap which adds a 
probable deleted portion. The matching score is 
recalculated using the homology score Table and the 
insertion score Table resulting in an optimized (OPT) 
score. Even in the absence of knowledge of the proper 
10 reading frame of an isolated sequence, the above-described 
protein homology search may be performed by searching all 3 
reading frames. 

Peptide and protein sequence homologies can also be 
ascertained using the INHERIT 67 0 Sequence Analysis System 
15 in an analogous way to that used in DNA sequence 

homologies. Pattern Specification Language and parameter 
windows are used to search protein databases for sequences 
containing regions of homology which are scored with an 
initial value. Subsequent display in a dot-matrix homology 
20 plot shows regions of homology versus regions of 

repetition. Additional search tools that are available to 
use on pattern search databases include PLsearch Blocks 
(available from Henikoff & Henikoff , University of 
Washington, Seattle), Dasher and GCG. Pattern search 
25 databases include, but are not limited to. Protein Blocks 
(available from Henikoff & Henikoff, University of 
Washington, Seattle) , Brookhaven Protein (available from 
the Brookhaven National Laboratory, Brookhaven, MA), 
PROSITE (available from Amos Bairoch, University of Geneva, 
30 Switzerland), ProDom (available from Temple Smith, Boston 
University) , and PROTEIN MOTIF FINGERPRINT (available from 
University of Leeds, United Kingdom). 

The ABI Assembler application software, part of the 
INHERIT DNA analysis system (available from Applied 
35 Biosystems, Inc., Foster City, CA) , can be employed to 

create and manage sequence assembly projects by assembling 
data from selected sequence fragments into a larger 
sequence. The Assembler software combines two advanced 
computer technologies which maximize the ability to 
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assemble sequenced DNA fragments into Assemblages, a 
special grouping of data where the relationships between 
sequences are shown by graphic overlap, alignment and 
statistical views. The process is based on the 
5 Meyers-Kececioglu model of fragment assembly (INHERIT™ 
Assembler User's Manual, Applied Biosystems, Inc., Foster 
City, CA) , and uses graph theory as the foundation of a 
very rigorous multiple sequence alignment engine for 
assembling DNA sequence fragments. Other assembly programs 

10 that can be used include MEGALIGN (available from DNASTAR 
Inc., Madison, WI) , Dasher and STADEN (available from Roger 
Staden, Cambridge, England) . 

Next, with reference to Fig. 2, we describe in more 
detail the "abundance sort" program which implements above- 

15 mentioned "step (b) " to tabulate the number of sequences of 
• the library which match each database entry (the "abundance 
number" for each database entry) . 

Fig. 2 is a flow chart of a preferred embodiment of 
the abundance sort program. A source code listing of this 

20 embodiment of the abundance sort program is set forth in 

Table 5. In the Table 5 implementation, the abundance sort 
program is written using the FoxBASE programming language 
commercially available from Microsoft Corporation. 
Although FoxBASE was the program chosen for the first 

25 iteration of this technology, it should not be considered 
limiting. Many other programming languages, Sybase being a 
particularly desirable alternative, can also be used, as 
will be obvious to one with ordinary skill in the art. The 
subroutine names specified in Fig. 2 correspond to 

30 subroutines listed in Table 5. 

With reference again to Fig. 2, the "Identified 
Sequences" are transcript sequences representing each 
sequence of the library and a corresponding identification 
of the database entry (if any) which it matches. In other 

35 words, the "Identified Sequences" are transcript sequences 
representing the output of above-discussed "step (a)." 

Fig. 3 is a block diagram of a system for implementing 
the invention. The Fig. 3 system includes library 
generation unit 2 which generates a library and asserts an 
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output stream of transcript sequences indicative of the 
biological sequences comprising the library. Programmed 
processor 4 receives the data stream output from unit 2 and 
processes this data in accordance with above-discussed 
5 "step (a)" to generate the Identified Sequences. Processor 
4 can be a processor programmed with the commercially 
available computer program known as the INHERIT 670 
Sequence Analysis System and the commercially available 
computer program known as the Factura program (both 
10 available from Applied Biosystems Inc.) and with the UNIX 
operating system. 

Still with reference to Fig. 3, the Identified 
Sequences are loaded into processor 6 which is programmed 
with the abundance sort program. Processor 6 generates the 
15 Final Transcript sequences indicated in both Figs. 2 and 3. 
Fig. 4 shows a more detailed block diagram of a planned 
relational computer system, including various searching 
techniques which can be implemented, along with an 
assortment of databases to query against. 
20 With reference to Fig. 2, the abundance sort program 

first performs an operation known as "Tempnum" on the 
Identified Sequences, to discard all of the Identified 
Sequences except those which match database entries of 
selected types. For example, the Tempnum process can 
25 select Identified Sequences which represent matches of the 
following types with database entries (see above for 
definition): "exact" matches, human "homologous" matches, 
"other species" matches representing genes present in 
species other than human) , "no" matches (no significant 
30 regions of homology with database entries representing 
previously identified nucleotide sequences) , "I" matches 
(Incyte for not previously known DNA sequences) , or "X" 
matches (matches ESTs in reference database) . This 
eliminates the U, S, M, V, A, R and D sequence (see Table 1 
35 for definitions) . 

The identified sequence values selected during the 
"Tempnum" process then undergo a further selection (weeding 
out) operation known as "Tempred." This operation can, for 
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exampl , discard all identified sequence values 
representing matches with selected database entries. 

The identified sequence values selected during the 
"Tempred" process are then classified according to library, 
5 during the "Tempdesig" operation. It is contemplated that 
the "Identified Sequences" can represent sequences from a 
single library, or from two or more libraries. 

Consider first the case that the identified sequence 
values represent sequences from a single library. In this 
10 case, all the identified sequence values determined during 
"Tempred" undergo sorting in the "Templib" operation, 
further sorting in the "Libsort" operation, and finally 
additional sorting in the "Temptarsort" operation. For 
example, these three sorting operations can sort the 
15 identified sequences in order of decreasing "abundance 
number" (to generate a list of decreasing abundance 
numbers, each abundance number corresponding to a unique 
identified sequence entry, or several lists of, decreasing 
abundance numbers, with the abundance numbers in each list 
20 corresponding to database entries of a selected type) with 
redundancies eliminated from each sorted list. in this 
case, the operation identified as "Cruncher" can be 
bypassed, so that the "Final Data" values are the organized 
transcript sequences produced during the "Temptarsort" 
25 operation. 

We next consider the case that the transcript 
sequences produced during the "Tempred" operation represent 
sequences from two libraries (which we will denote the 
"target" library and the "subtractant" library). For 
30 example, the target library may consist of cDNA sequences 
from clones of a diseased cell, while the subtractant 
library may consist of cDNA sequences from clones of the 
diseased cell after treatment by exposure to a drug. For 
another example, the target library may consist of cDNA 
35 sequences from clones of a cell type from a young human, 

while the subtractant library may consist of cDNA sequences 
from clones of the same cell type from the same human at 
different ages. 
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In this case, the "Tempdesig" operation routes all 
transcript sequences representing the target library for 
processing in accordance with "Templib" (and then "Libsort" 
and "Temptarsort") , and routes all transcript sequences 
5 representing the subtractant library for processing in 
accordance with "Tempsub" (and then "Subsort" and 
"Tempsubsort") . For example, the consecutive "Templib," 
"Libsort," and "Temptarsort" sorting operations sort 
identified sequences from the target library in order of 
10 decreasing abundance number (to generate a list of 
decreasing abundance numbers, each abundance number 
corresponding to a database entry, or several lists of 
decreasing abundance numbers, with the abundance numbers in 
each list corresponding to database entries of a selected 
15 type) with redundancies eliminated from each sorted list. 
•The consecutive "Tempsub," "Subsort," and "Tempsubsort" 
sorting operations sort identified sequences from the 
subtractant library in order of decreasing abundance number 
(to generate a list of decreasing abundance numbers, each 
20 abundance number corresponding to a database entry, or 
several lists of decreasing abundance numbers, with the 
abundance numbers in each list corresponding to database 
entries of a selected type) with redundancies eliminated 
from each sorted list. 
25 The transcript sequences output from the "Temptarsort" 

operation typically represent sorted lists from which a 
histogram could be generated in which position along one 
(e.g., horizontal) axis indicates abundance number (of 
target library sequences) , and position along another 
30 (e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type). Similarly, the 
transcript sequences output from the "Tempsubsort" 
operation typically represent sorted lists from which a 
histogram could be generated in which position along one 
35 (e.g., horizontal) axis indicates abundance number (of 

subtractant library sequences) , and position along another 
(e.g., vertical) axis indicates identified sequence value 
(e.g., human or non-human gene type). 
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The transcript sequences (sorted lists) output from 
the Tempsubsort and Temptarsort sorting operations are 
combined during the operation identified as "Cruncher." 
The "Cruncher" process identifies pairs of corresponding 
5 target ^and subtractant abundance numbers (both representing 
the same identified sequence value) , and divides one by the 
other to generate a "ratio" value for each pair of 
corresponding abundance numbers, and then sorts the ratio 
values in order of decreasing ratio value. The data output 
10 Trom the "Cruncher" operation (the Final Transcript 

sequence in Fig. 2) is typically a sorted list from which a 
histogram could be generated in which position along one 
axis indicates the size of a ratio of abundance numbers 
(for corresponding identified sequence values from target 
15 and subtractant libraries) and position along another, axis 
indicates identified sequence value (e.g., gene type). 

Preferably, prior to obtaining a ratio between the two 
library abundance values, the Cruncher operation also 
divides each ratio value by the total number of. sequences 
20 in one or both of the target and subtractant libraries. 

The resulting lists of "relative" ratio values generated by 
the Cruncher operation are useful for many medical, 
scientific, and industrial applications. Also preferably, 
the output of the Cruncher operation is a set of lists, 
25- each list representing a sequence of decreasing ratio 
values for a different selected subset (e,g. protein 
family) of database entries. 

In one. example, the abundance sort program of the 
invention tabulates for a library the numbers of mRNA 
30 transcripts corresponding to each gene identified in a 

database. These numbers are divided by the total number of 
clones sampled. The results of the division reflect the 
relative abundance of the mRNA transcripts in the cell type 
or tissue from which they were obtained. Obtaining this 
35 final data set is referred to herein as "gene transcript 
image analysis." The resulting subtracted data show 
exactly what proteins and genes are upregulated and 
downregulated in highly detailed complexity. 
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6.6. HUVEC cDNA LIBRARY 
Table 2 is an abundance table listing the various gene 
transcripts in an induced HUVEC library. The transcripts 
are listed in order of decreasing abundance. This 
5 computerized sorting simplifies analysis of the tissue and 
speeds identification of significant new proteins which are 
specific to this cell type. This type of endothelial cell 
lines tissues of the cardiovascular system, and the more 
that is known about its composition, particularly in 
10 response to activation, the more choices of protein targets 
become available to affect in treating disorders of this 
tissue, such as the highly prevalent atherosclerosis. 

6-7. MONOCYTE-CELL AND MAST-CELL cDNA LIBRARIES 
Tables 3 and 4 show truncated comparisons of two 
15 libraries. In Tables 3 and 4 the "normal monocytes" are 
the HMC-1 cells, and the "activated macrophages" are the 
THP-1 cells pretreated with PMA and activated with LPS. 
Table 3 lists in descending order of abundance the most 
abundant gene transcripts for both cell types. With only 
20 15 gene transcripts from each cell type, this table permits 
quick, qualitative comparison of the most common 
transcripts. This abundance sort, with its convenient 
side-by-side display, provides an immediately useful 
research tool. In this example, this research tool 
25 discloses that 1) only one of the top 15 activated 
macrophage transcripts is found in the top 15 normal 
monocyte gene transcripts (poly A binding protein); and 2) 
a new gene transcript (previously unreported in other 
databases) is relatively highly represented in activated 
30 macrophages but is not similarly prominent in normal 

macrophages. Such a research tool provides researchers 
with a short-cut to new proteins, such as receptors, cell- 
surface and intracellular signalling molecules, which can 
serve as drug targets in commercial drug screening 
35 programs. Such a tool could save considerable time over 
that consumed by a hit and miss discovery program aimed at 
identifying important proteins in and around cells, because 
those proteins carrying out everyday cellular functions and 
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15 



20 



represented as steady state mRNA .are quickly eliminated 
from further characterization. 

This illustrates how the gene transcript profiles 
change with altered cellular function. Those skilled in 
5 the art know that the biochemical composition of cells also 
changes with other functional changes such as cancer, 
including cancer's various stages, and exposure to 
toxicity. A gene transcript subtraction profile such as in 
Table 3 is useful as a first screening tool for such gene 
10 expression and protein studies. 

6.8. SUBTRACTION ANALYSIS OP NORMAL MONOCYTE-CELL AND 
ACTIVATE D MONOCYTE CELL ePNA LIBRARTBfi 

Once the cDNA data are in the computer, the computer 
program as disclosed in Table 5 was used to obtain ratios 
of all the gene transcripts in the two libraries discussed 
in Example 6.7, and the gene transcripts were sorted by the 
descending values of their ratios. If a gene transcript is 
not represented in one library, that gene transcript's 
abundance is unknown but appears to be less than 1. As an 
approximation — and to obtain a ratio, which would not be 
possible if the unrepresented gene were given an abundance 
of zero ~ genes which are represented in only one of the 
two libraries are assigned an abundance of 1/2. Using 1/2 
for unrepresented clones increases the relative importance 
of "turned-on" and "turned-off" genes, whose products would 
be drug candidates. The resulting print-out is called a 
•subtraction table and is an extremely valuable screening 
method, as is shown by the following data. 

Table 4 is a subtraction table, in which the normal 
monocyte library was electronically "subtracted" from the 
activated macrophage library. This table highlights most 
effectively the changes in abundance of the gene 
transcripts by activation of macrophages. Even among the 
first 20 gene transcripts listed, there are several unknown 
35 gene transcripts. Thus, electronic subtraction is a useful 
tool with which to assist researchers in identifying much 
more quickly the basic biochemical changes between two cell 
types. Such a tool can save universities and 
pharmaceutical companies which spend billions of dollars on 

35 
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research valuable time and laboratory resources at the 
early discovery stage and can speed up the drug development 
cycle, which in turn permits researchers to set up drug 
screening programs much earlier. Thus, this research tool 
5 provides a way to get new drugs to the public faster and 
more economically. 

Also, such a subtraction table can be obtained for ' 
patient diagnosis. An individual patient sample (such as 
monocytes obtained from a biopsy or blood sample) can be 
10 compared with data provided herein to diagnose conditions 
associated with macrophage activation. 

Table 4 uncovered many new gene transcripts (labeled 
Incyte clones) . Note that many genes are turned on in the 
activated macrophage (i.e., the monocyte had a 0 in the 
15 bgfreq column) . This screening method is superior to other 
screening techniques, such as the western blot, which are 
incapable of uncovering such a multitude of discrete new 
gene transcripts. 

The subtraction-screening technique has also uncovered 
20 a high number of cancer gene transcripts- (oncogenes rho, 
ETS2, rab-2 ras, YPTl-related, and acute myeloid leukemia 
mRNA) in the activated macrophage. These transcripts may 
be attributed to the use of immortalized cell lines and are 
inherently interesting for that reason. This screening 
25 technique offers a detailed picture of upregulated 

transcripts including oncogenes, which helps explain why 
anti-cancer drugs interfere with the patient's immunity 
mediated by activated macrophages. Armed with knowledge 
gained from this screening method, those skilled in the art 
30 can set up more targeted, more effective drug screening 
programs to identify drugs which are differentially 
effective against 1) both relevant cancers and activated 
macrophage conditions with the same gene transcript 
profile; 2) cancer alone; and 3) activated macrophage 
35 conditions. 

Smooth muscle senescent protein (22 kd) was 
upregulated in the activated macrophage, which indicates 
that it is a candidate to block in controlling 
inflammation. 
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6.9. SUBTRACTION ANALYSIS OF NORMAL LIVER CELLS AND 
HEPATITIS INFECTED LIVER CELL cDNA LIBRARIES 

In this example, rats are exposed to hepatitis virus 

and maintained in the colony until they show definite signs 

5 of hepatitis. Of the rats diagnosed with hepatitis, one 

half of the rats are treated with a new anti-hepatitis 

agent (AHA). Liver samples are obtained from all rats 

before exposure to the hepatitis virus and at the end of 

AHA treatment or no treatment. In addition, liver samples 

10 can be obtained from rats with hepatitis just prior to AHA 

treatment . 

The liver tissue is treated as described in Examples 
6.2 and 6.3 to obtain mRNA and subsequently to sequence 
cDNA. The cDNA from each sample are processed and analyzed 

15 for abundance according to the computer program in Table 5. 
The resulting gene transcript images of the cDNA provide 
detailed pictures of the baseline (control) for each animal 
and of the infected and/or treated state of the animals. 
cDNA data for a group of samples can be combined into a 

20 group summary gene transcript profile for all control 
samples, all samples from infected rats and all samples 
from AHA- treated rats. 

Subtractions are performed between appropriate 
individual libraries and the grouped libraries. For 

25 individual animals, control and post-study samples can be 
subtracted. Also, if samples are obtained before and after 
AHA treatment, that data from individual animals and 
treatment groups can be subtracted. In addition, the data 
for all control samples can be pooled and averaged. The 

30 control average can be subtracted from averages of both 
post-study AHA and post-study non-AHA cDNA samples. If 
pre- and post-treatment samples are available, pre- and 
post-treatment samples can be compared individually (or 
electronically averaged) and subtracted. 

35 These subtraction tables are used in two general ways. 

First, the differences are analyzed for gene transcripts 
which are associated with continuing hepatic deterioration 
or healing. The subtraction tables are tools to isolate 
the effects of the drug treatment from the underlying basic 

40 pathology of hepatitis. Because hepatitis affects many 
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parameters, additional liver toxicity has been difficult to 
detect with only blood tests for the usual enzymes. The 
gene transcript profile and subtraction provides a much 
more complex biochemical picture which researchers have 
5 needed to analyze such difficult problems. 

Second, the subtraction tables provide a tool for 
identifying clinical markers, individual proteins or other 
biochemical determinants which are used to predict and/or 
evaluate a clinical endpoint, such as disease, improvement 
due to the drug, and even additional pathology due to the 
drug. The subtraction tables specifically highlight genes 
which are turned on or off. Thus, the subtraction tables 
provide a first screen for a set of gene transcript 
candidates for use as clinical markers. Subsequently, 
15 electronic subtractions of additional cell and tissue 

libraries reveal which of the potential markers are in fact 
found in different cell and tissue libraries. Candidate 
gene transcripts found in additional libraries are removed 
from the set of potential clinical markers. Then, tests of 
20 blood or other relevant samples which are known to lack and 
have the relevant condition are compared to validate the 
selection of the clinical marker. In this method, the 
particular physiologic function of the protein transcript 
need not be determined to qualify the gene transcript as a 
25 clinical marker. 
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6.10. ELECTRONIC NORTHERN BLOT 
One limitation of electronic subtraction is that it is 
difficult to compare more than a pair of images at once. 
Once particular individual gene products are identified as 
relevant to further study (via electronic subtraction or 
other methods) , it is useful to study the expression of 
single genes in a multitude of different tissues. in the 
lab, the technique of "Northern" blot hybridization is used 
for this purpose. In this technique, a single cDNA, or a 
probe corresponding thereto, is labeled and then hybridized 
against a blot containing RNA samples prepared from a 
multitude of tissues or cell types. Upon autoradiography, 
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the pattern of expression of that particular gene, one at a 
time, can be quant itated in all the included samples. 

In contrast, a further embodiment of this invention is. 
the computerized form of this prpcess, termed here 
5 "electronic northern blot." In this variation, a single 
gene is queried for expression against a multitude of 
prepared and sequenced libraries present within the 
database. In this way, the pattern of expression of any 
single candidate gene can be examined instantaneously and 

10 effortlessly. More candidate genes can thus be scanned, 
leading to more frequent and fruitfully relevant 
discoveries. The computer program included as Table 5 
includes a program for performing this function, and Table 
6 is a partial listing of entries of the database used in 

15 the electronic northern blot analysis. 

6.11. PHASE I CLINICAL TRIALS 
Based on the establishment of safety and effectiveness 
in the above animal tests. Phase I clinical tests are 
undertaken. Normal patients are subjected to the usual 

20 preliminary clinical laboratory tests. In addition i 
appropriate specimens are taken and subjected to gene 
transcript analysis. Additional patient specimens are 
taken at predetermined intervals during the test. The 
specimens are subjected to gene transcript analysis as 

25 described above. In addition, the gene transcript changes 
noted in the earlier rat toxicity study are carefully 
evaluated as clinical markers in the followed patients, 
changes in the gene transcript analyses are evaluated as 
indicators of toxicity by correlation with clinical signs 

30 and symptoms and other laboratory results. In addition, 
subtraction is performed on individual patient specimens 
and on averaged patient specimens. The subtraction 
analysis highlights any toxicological changes in the 
treated patients. This is a highly refined determinant of 

35 toxicity. The subtraction method also annotates clinical 
. markers. Further subgroups can be analyzed by subtraction 
analysis, including, for example, 1) segregation by 
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occurrence and type of adverse effect; and 2) segregation 
by dosage. 

6.12. GENE TRANSCRIPT IMAGING ANALYSIS TN CLINICAL STUDIES 
A gene transcript imaging analysis (or multiple gene 
5 transcript imaging analyses) is a useful tool in other 
clinical studies. For example, the differences in gene 
transcript imaging analyses before and after treatment can 
be assessed for patients on placebo and drug treatment. 
This method also effectively screens for clinical markers 
10 to follow in clinical use of the. drug. 

. ^'IS* COMPARATIVE GENE TRANSCRIPT ANALYSIS BETWEEN SPECIES . 
The subtraction method can be used to screen cDNA 
libraries from diverse sources. For example, the same cell 
types from different species can be compared by gene 
15 transcript analysis to screen for specific differences, 
such as in detoxification enzyme systems. Such testing 
aids in the selection and validation of an animal model for 
the commercial purpose of drug screening or toxicological 
testing of drugs intended for human or animal use. When 
20. the comparison between animals of different species is 
. shown in columns for each species, we refer to this as an 
interspecies comparison, or zoo blot. 

Embodiments of this invention may employ databases 
such as those written using the FoxBASE programming 
25 language commercially available from Microsoft Corporation. 
Other embodiments of the invention employ other databases, 
such as a random peptide database, a polymer database, a 
synthetic oligomer database, or a oligonucleotide database 
of the type described in U.S. Patent 5,270,170, issued 
30 December 14, 1993 to Cull, et al., POT International 

Application Publication No. WO 9322684, published November 
11, 1993, PCT International Application Publication No. WO 
9306121, published April 1, 1993, or PCT International 
Application Publication No. WO 9119818, published December 
35 26, 1991. These four references (whose text is 

incorporated herein by reference) include teaching which 
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may be applied in implementing such other embodiments of 
the present invention. 

All references referred to in the preceding text are 
hereby expressly incorporated by reference herein. 
5 Various modifications and variations of the described 

method and system of the invention will be apparent to 
those skilled in the art without departing from the scope 
and spirit of the invention. Although the invention has 
been described in connection with specific preferred 
10 embodiments, it should be understood that the invention as 
claimed should not be unduly limited to such specific 
embodiments. 
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TABLE 2 



Clone numbers 15000 through 20000 

Libraries; HUVEC 

Arranged by ABUNDANCE 

Total clones analyzed: 5000 

319 genes,, for a total of 1713 Clones 





number 


N 


c 


entry 


1 


15365 


67 




HSRPL41 


2 


15004 


65 




NCY015004 


3 


15638 


63 




NCy015638 


4 


15390 


50 




NCy015390 


5 


15193 


47 




HSFIBl 


6 


15220 


47 




RRRPL9 


7 


15280 


47 




NCy015280 


8 


15583 


33 




M62060 


9 


15662 


31 




HSACTCGR 


10 


15026 


29 




NCy015026 


11 


15279 


24 




HSEFIAR 


12 


15027 


23 




NCy015027 


13 


15033 


20 




NCy015033 


14 


15198 


20 




NCy015198 


15 


15809 


20 




HS COLLI 


16 


15221 


19 




NCy015221 


17 


15263 


19 




NCy015263 


18 


15290 


19 




NCy015290 


19 


15350 


18 




NCy015350 


20 


15030 


17 




NCy015030 


21 


15234 


17 




NCy015234 


22 


15459 


16 




NCy015459 


23 


15353 


15 




NCy015353 


2'4 


15378 


15 




S76965 


25 


15255 


14 




HUMTHYB4 


26 . 


15401 


14 




HSLIPCR 


27 


15425 


14 




HS POL YAH 


28 


18212 


14 




HUMTHYMA 


29 


18216 


14 




HSMRPl 


30 


15189 


13 




HS18D 


31 


15031 


12 




HUMFKBP 


32 


15306 


12 




HSH2AZ 


33 


15621 


12 




HUMLEC 


34 


15789 


11 




NCY015789 


35 


16578 


11 




HSRPSll 


36 


16632 


11 




M61984 


37 


18314 


11 




NCY018314 


38 


15367 


10 




NCY015367 


39 


15415 


10 




HSIFNINl 


40 


15633 


10 




HSLDHAR 


41 


15813 


10 




CHKNMHCB 


42 


18210 


10 




, NCY018210 


43 


18233 


10 




HSRPII140 


44 


18996 


10 




NCY018996 


45 


15088 


9 




HUMFERL 


46 


15714 


9 




NCY015714 


47 


15720 


9 




NCy015720 


48 


15863 


9 




NCY015863 


49 


16121 


9 




HSET 


50 


18252 


9 




NCy018252 


51 


15351 . 


8 




HUMALBP 


52 


15370 


8 




NCY015370 



descriptor 

Riboptn L41 

INCYTE 015004 

INCYTE 015638 

INCYTE 015390 

Fibronectin 

Riboptn L9 

INCYTE 015280 

EST HHCH09 (ICR) 

Actin, gamma . 

INCYTE 015026 

Elf 1-alpha 

INCYTE 015027 

INCYTE 015033 

INCYTE 015198 

Collagenase 

INCYTE 015221 

INCYTE 015263 

INCYTE 015290 

INCYTE 015350 

INCYTE 015030 

INCYTE 015234 

INCYTE 015459 

INCYTE 015353 

Ptn kinase inhib 

Thymosin beta-4 

Lipocortin I 

Poly-A bp 

Thymosin, alpha 

Motility relat ptn; MRP-1;CD- 

Interferon indue ptn 1-8D 

FK506 bp 

Histone H2A 

Lectin, B-galbp, 14kDa 
INCYTE 015789 
Riboptn Sll 
EST HHCA13 (ICR) 
INCYTE 018314 
INCYTE 015367 
interferon indue mRNA 
Lactate dehydrogenase 
C Myosin heavy chain B 
INCYTE 018210 
RNA polymerase II 
INCYTE 018996 
Ferritin, light chain 
INCYTE 015714 
INCYTE 015720 
INCYTE 015863 
Endothelin 
INCYTE 018252 
Lipid bp, adipocyte 
INCYTE 015370 
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TABLE 2 Con't 





number 


N 


c 


entry 


53 


15670 


8 




BTCIASHI 


54 


15795 


8 




NCY015795 


55 


16245 


8 




NCY016245 


56 


18262 


8 




NCy018262 


57 


18321 


8 




HSRPL17 


58 


15126 


7 




XLRPLIBRF 


59 


15133 


7 




HSAC07 


60 


15245 


7 




NCy015245 


61 


15288 


7 




NCY015288 


62 


15294 


7 




HSGAPDR 


63 


15442 


7 




HUMLAMB 


64 


15485 


7 




HSNGMRNA 


65 


16646 


7 




NCY016646 


66 


18003 


7 




HUMPAIA 


67 


15032 


6 




HUMUB 


68 


15267 


6 




HSRPS8 


69 


15295 


6 




NCY015295 


70 


15458 


6 




RNRPSIOR 


71 


15832 


6 




RSGALEM 


72 


15928 


6 




HUMAPOJ 


73 


16598 


6 




HUMTBBM40 


74 


18218 


5 






75 


18499 


5 




HSP27 


76 


18963 


5 






77 


18997 


6. 






78 


15432 


5 






79 


15475 


5 






80 


15721 


5 






81 


15865 


5 






82 


16270 


5 




NCY016270 


83 


16886 


5 




NCY016886 


84 


18500 


5 




NCY018500 


85 


18503 


5 




NCY018503 


86 


19672 


5 




RRRPL34 


87 


15086 


4 




XLRPLIAR 


88 


15113 


4 




HUMIFNWRS 


89 


15242 


4 




NCY015242 


90 


15249 


4 




NCY015249 


91 


15377 


4 




NCY015377 


92 


15407 


4 




NCY015407 


93 


15473 


4 




NCy015473 


94 


15588 


4 




HSRPS12 


95 


15684 


4 




HSEFIG 


96 


15782 


4 




NCY015782 


97 


15916 


4 




HSRPS18 


98 


15930 


4 




NCY015930 


99 


16108 


4 




NCY016108 


100 


16133 


4 




NCY016133 



R 

- R 



R 

F 



descriptor 

NADH-ubiq oxidoreductase 

INCYTE 015795 

INCYTE 016245 

INCYTE 018262 

Riboptn L17 

Riboptn LI 

Act in, beta 

INCYTE 015245 

INCYTE 015288 

G-3-PD 

Laminin receptor, 54kDa 
Uracil DNA glycosylase 
INCYTE 016646 
Plsmnogen activ gene 
Ubiquitin 
Riboptn S8 
INCYTE 015295 
Riboptn SIO 

UDP-galactose epimerase 
Apolipoptn J 
Tubulin, beta 
INCYTE 018218 
Hydrophobic ptn p27 
INCYTE 018963 
INCYTE 018997 
Galactosidase A, alpha 
INCYTE 015475 
015721 
015865 
016270 
016886 
018500 
018503 



INCYTE 
INCYTE 
INCYTE 
INCYTE 
INCYTE 
INCYTE 
Riboptn L34 
Riboptn Lla 
tRNA synthetase, 
INCYTE 015242 
INCYTE 015249 
INCYTE 015377 
INCYTE 015407 
INCYTE 015473 
Riboptn S12 
Elf 1-gamma 
INCYTE 015782 
Riboptn S18 
INCYTE 015930 
INCYTE 016108 
INCYTE 016133 



trp 
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TABLE 4 



Libraries: THP-1 

Subtracting: HMC 

Sorted by ABUNDANCE 

Total clones analyzed: 7375 

1057 genes, for a total of 2151 clones 



number 


entry 


s descriptor 


bgf req 


rfend 


ratio 


10022 


HUMILl 


IL 1-beta 


0 


131 


262.00 


10036 


HSMDNCF 


IL-8 


0 


119 ■ 


238.00 


10089 


HSLAGICDN 


Lymphocyte activ gene 


0 


71 


142.00 


10060 


HUMTCSM 


RANTES 


0 


23 


46.000 


10003 


.HUMMIPIA 


MIP-1 


3 


121 


40.333 


10689 


HSOP 


Osteopontin 


0 


20 


40. 000 

w • WW W 


11050 


NCY011050 


INCYTE 011050 


0 


17 


34. 000 


10937 


HSTNFR 


TNF-alpha 


0 


17 


34 OOO 


10176 


HSSOD 


Superoxide dismutase 


0 






10886 


HSCDW40 


B—cell act iv/ NGF— relat 




1 n 
J. \j 


on nnn 
• uuu 


10186 


HUMAPR 


Early resp PMA— indue 


0 


Q 


1 A nnn 

U.O . LIU U 


10967 


HUMGDN 


PN-1, glial-deriv 


0 


Q 

-7 


18 000 


11353 


NCY011353 


INCYTE 011353 


0 


Q 
O 


1 nnn 


10298 


NCY010298 


INCYTE 010298 


0 


7 


14 . 000 


10215 


HUM4COLA 


Collagenase, type IV 


0 


5 


12 . 000 


10276 


NCy010276 


INCYTE 010276 


0 


5 


12 - 000 


10488. 


NCy010488 


INCYTE 010488 


0 




12 • 000 


11138 


NCY011138 


INCYTE 011138 


0 




12 . 000 


10037 


HUMCAPPRO 


Adenylate cyclase 


1 


10 


10 000 


10840 


HUMADCY 


Adenylate cyclase 


0 


5 


10 000 

w • www 


10672 


HSCD44E 


Cell adhesion glptn 


0 


5 


10 000 

^ W » W w W 


12837 


HUMCYCLOX 


Cyclooxygenase-2 


0 


5 


10 000 

^ w • www 


10001 


NCYOlOOOl 


INCYTE 010001 


0 


5 


10 000 

^ w m Www 


10005 


NCy010005 


INCYTE 010005 


0 


5 


10 000 

^ w . www 


10294 


NCY010294 


INCYTE 010294 


0 


5 


10 000 

^w • www 


10297 


NCY010297 


INCYTE 010297 


0 


5 


10. 000 

^w * www 


10403 


NCY010403 


INCYTE 010403 


0 


5 


10 . 000 


10699 


^NCY010699 


INCYTE .010699 


0 


5 


10.000 


10966 


NCY010966 


INCYTE 010966 


0 


5 


10.000 


12092 


NCY012092 


INCYTE 012092 


0 


5 


10.000 


12549 


HSRHOB 


Oncogene rho 


0 


5 


10.000 


10691 


HUMARFIBA 


ADP-ribosylation fctr 


0 


4 


8.000 


12106 


HSADSS 


Adenylosuccinate synthetase 


0 


4 


8.000 


10194 


HSCATHL 


Cathepsin L 


0 


4 


8.000 


10479 


CLMCYCA 


I Cyclin A 


0 


4 


8.000 


10031 


NCy010031 


INCYTE 010031 


0 


4 


8.000 


10203 


Ncyoi0203 


INCYTE 010203 


0 


4 


8.000 


10288 


NCY010288 


INCYTE 010288 


0 


4 


8.000 


10372 


NCY010372 


INCYTE 010372 


0 


4 


8.000 


10471 


NCY010471 


INCYTE 010471 


0 


4 


8.000 


10484 


NCy010484 


INCYTE 010484 


0 


4 


8.000 


10859 


NCy010859 


INCYTE 010859 


0 


4 


8.000 


10890 


NCY010890 


INCYTE 010890 


0 


4 


8.000 


11511 


NCYOllSll 


INCYTE 011511 


0 


4 


8.000 


11868 


NCY011868 


INCYTE 011868 


0 


4 


6.000 


12820 


NCY012820 


INCYTE 012820 


0 


4 


8.000 


10133 


HSIIRAP 


IL-1 antagonist 


0 


4 


8.000 


10516 


HUMP2A 


Phosphatase, regul 2A 


0 


4 


8-000 


11063 


HUMB94 


TNF-induc response 


0 


4 


8.000 


11140 


HSHB15RNA 


HB15 gene; new Ig 


0 


3 


6.000 


10788 


NCy001713 


INCYTE 001713 


0 


3 


6.000 


10033 


.NCY010033 


INCYTE 010033 


0 


3 


6.000 


10035 


NCY010035 


INCYTE 010035 


0 


3 


6.000 


10084 


NCY010084 


INCYTE 010084 


0 


3 


6.000 


10236 


NCy010236 


INCYTE 010236 


0 


3 


6.000 


10383 


NCY010383 


INCYTE 010383 


0 


3 


6.000 



^ 6 
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TABLE 4 Con't 



number 


entry 


s descriptor 


10450 


NCY010450 


INCyTE 


010450 


10470 


NCy010470 


INCyTE 


010470 


10504 


NCy010504 


INCyTE 


010504 


10507 


NCY010507 


INCyTE 


010507 


10598 


NCy010598 


INCyTE 


010598 


10779 


NCy010779 


INCyTE 


010779 


10909 


NCy010909 


INCyTE 


010909 


10976 


NCy010976 


INCyTE 


010976 


10985 


NCy010985 


INCyTE 


010985 


11052 


NCy011052 


INCyTE 


011052 


11068 


NCy011068 


INCyTE 


011068 


11134 


NCy011134 


INCyTE 


011134 


11136 


' NCy011136 


INCYTE 


011136 


11191 


NCy011191 


INCYTE 


011191 


11219 


NCy011219 


INCYTE 


011219 


11386 


NCy011386 


INCYTE 


011386 


11403 


NCy011403 


INCYTE 


011403 


11460 


NCy011460 


INCYTE 


011460 


11618 , 


NCy011618 


INCYTE 


011618 


11686 


NCy011686 


INCYTE 


011686 


12021 


NCy012021 


INCYTE 


012021 


12025 


NCy012025 


INCYTE 


012025 


12320 


NCy012320 


INCYTE 


012320 


12330 


NCy012330 


INCYTE 


012330 


12853 


NCy012853 


INCYTE 


012853 


14386 


NCy014386 


INCYTE 


014386 


14391 


NCy014391 


INCYTE 


014391 



bgfreq rfend ratio 



n 
u 




6 • 000 


n 
u 


J 


6. 000 


u 


3 


6. 000 


u 


3 


6* 000 


U 


3 


6.000 


o 
u 


3 


6. 000 


n 
U 


3 


6. 000 


U 


3 


6.000 


U 


3 


6. 000 


u 


3 


6.000 


u 


3 


6.000 


A 


3 


6.000 


u 


3 


6* 000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 


0 


3 


6.000 
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TABLE 5 



• Master menu for B03TRACTI0N output 

SOT T AIg_O Pg 

SET SAPBTlf OFF 

SET SACT OStT 

SET TlfPBAHEAD TOO 

a£AR * 

SFT DEVICE TO SCPSEti 

t;ss-"SmartGuyjFoxBASE+/Mac;fox files? Clones. db£» 

go TO P 

SlO gS^NtM BBR TO INITIATE 
CO BOTTOM^ 

STCT B ITUKBKR TO 'T^nHZKAfTE 
STORE * TO Targecl 

STORE ' ' TO Target2 

STORE * . • TO Targets 

STORE." » TO Objectl 

STOte ' ■ « TO abject2 

STORB ' • TO Cfbject3 

STORE 0 TO ANAL ' • 
STOg S 0 TO EMATOH 
STORE 0 TO HKATOR 
0 TO CMATOH 
CTORE 0 TO BOTOK 
STORE 0 TO PX7 
STORB 1 TO BAH/ 
DO RHILE .T. * 

* 'Progron. i 'Subtraction 2.frnt 

* Version, t FoxBASE+ZMac, revisiori'l.lO 

* Notea. . . . : Pomat file Subtraction 2 



SCREEDT 1 TyPE 0 HEADING -Screen !• AT 40,2 SIZE 28S,492 PIXELS PONT •QeneVB'.S COLOR 0 0 D 
d PIXELS 75,120 TO 178,241 SmS 3871 COLOR 0,0,-1,24610,-178947 ' ' ' 

0 PIXELS 27,134 SAY 'Subtraction Menu- SIYLE 65536 FOOT ^Geneva', 274 COLOR 0 0-1-1 ^1 -1 

0 PIXELS.117,126 GET DIATCH STYLE 6S536 TOOT •Chicago^l2 PiSSrB '^C^c^'- SK^^^^ 

e -PIXELS 135 ,.126 GET HMATCH ST^fi 65536 FTOT 'Chicago- 12 PICT^ 'C'C ^locouf? StIb IQ^ 
e PIXELS 163,126 GET OHATCH SIYLE 65536 FONT -Sica^o' 12 PlCnmi 4*C Oth^™? SI^IS 84 
6 PIXELS 90,152 SAY -Matcheaf. SITOE 65536 FOfcW -Ceievi- ,12 TOWR o!o -1-1 -1^1 
€ PIXELS 171,126 GET Imatch STifLE 65536 FOTT •ChicagoM2 PIOITOB "Q^C Incvte" 'ciZE lS « ro 
e PIXELS 252,137 GET itlitiate STVLE 0 POTT 'Geneva-flS SIZE 15^ COLOR oT^l -i ^ 

a PIXELS 252,236 GET tennlnate STYLE 0 FQMT •G«nevaM2 SIZE 15,70 COLOR 6.6, -i -i -1 -1 

1 2!?'" -include clones- - STYLE 65536 foot 'Geneva? iS 55^ 0 0 -1 -1 Ii 

•2 ^5536 FONT •GenevaM4 COLOR CoT-lT^l/ll ' ' ' ^ 

e Pim,S 198,126 GET PTF 6TXLE 65536 FWV •Chidago-,12 PICTORE -6*0 Print to file- ETZS 15' 9 
e-piXELB 90,9 TO 191,109 STYLE 3871 CCmi 0,0,. i;-25600, -1,-1 ^ '"^^"^ SIZr. 15,9 

d PIXELS 90,388 TO*191,397 STYLE 3871 COLOR 0,0,-1,-25600,-1,-1 

a PIXELS 81,296 SWT 'Background:* STYIE 65536 PCNT -Geneva', 270 COLOR 0.0,-1 -1 -1 -1 

I 1^ f^H^mJi^ -Chicago-^U PICTORE 'S-R Ovekil ; mitiin- SIZE 4 

0 PIXELS 81,26 6Ay "Targetj" STYLE 65536 FONT •GBneva-,270 COLOR 0,0,-1 -i -i 
5 PIXELS 108,20 GET targetl STYLE 0 PCMT 'Geneva-,9 SIZE 12,79 COLOR 0.0.-1 -1 -1 -i 
•Q PIXELS 135,20 GET targets STYLE 0 PCOT -Geneva', 9 SIZE 12,79 COLOR 0,0 -I -l' -1-1 
.8 PIXELS 162,20 GET targets STYLE 0 PCNT -Geneva-.? SIZE 12 79 COLOR 0 0 -1 -1-1 -i 
e PDCQ5 108,-299 GET objectl 6TYUE 0 PCOT 'Geneva', 9 SIZE 12,79^^ 6,6,-1.-1-1 -1 
« PCE^ 135,299 OET object2 STYLE 0. FOOT "Geneva- 9 SIZE 12 79 cSw 0 0^1 -llll'li 
8 PIXELS 162,299 GET 6bject3 STYLE 0 KOT -Geneva- 9 SIZE 12 79 COLOR 0 0 -I -I -1 Ij 
•8 PIXELS 276,324-GEr Bail STYLE 65536 FONT -Chicago-, 12 PICtGre -e*R Ruii;BaU oit- SIZE 4112 



* EOFz Subtraction. 2. fmt 
READ - 
IF Bail»2 

CLOSE DATABASES 

USE - Smart Guy ; FoxBASSt /Hac i fox files : elonea. dbf - 
.SET SAB^TY ON 
SCREBj.l OFF 
RETDRN 
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'STOlffi VMi(5 YS(2) ) TO flIARTIMS 
STORE UPPER (Target! ). TO Target 1 
STORS. PPPg .( Target 2) TO Targets 
fiTGRB UPPER (Targets) TO Targets 
STORE UPPBR(Objectl) Tp Qbjectl* 
STORE UPPE»(0bJect2) TO- 035ject2 
STORE UPP£R(0bject3) TO Objects 
clear 

SET W 

GftP s TERMZNATS-'XirrTIArE^l 
GO BOTIWE 

SSf'L^Sl^^ NU>QERaibrary,D.P,2,R,EimY,S,riBSCRrPTOR,ST7mT,R^ TO TEMENUM 

USS ^rSvnUH 

COUNT TO TOT 

copy TO TEMPRED FOR Ds'E* .OR.Ds 'O* .OR.D=»H' iOR,t)= 'N' -OR.Da'I ' 
USS TEKPRSD 

IF ftnatchaO «AND. anatch=0 .AND. QnatchoO .MO. maiCKsO 

COPV TO T^dPDESIG 

ELSE 

copy STRyCTURB TO TEMPOESIG 
USE T^IMPOESIG 
iraBatchwl 

APPEND FROM. ISUPNUM FOR D^'E' 
EKD2F 

iP'lflnatchsl 

APPEND PROM TEMENUM FOR I>='H' 
iMDIF 

IP Gnatchsl 

Appa© fr6h'te«pnum for D^'O' 

ENDXP 

IP instcbsl 

APP©3D FRCM TEMENUM FOR Jfc' I *.0R.D3*X* 
r.OR.Do'M' 

_.ap iF 
ehdi f 

COUNT TO STARTOT 

COPY STRUCTORE TO TEMPLIB 
.USB TOfPLIB ... 

APPEND FROM TEMPDESIG FOR libraryi«uppBR(targetl) 

IP target2o' . ' . 

APPaSD^FROM TEMPDESIG FOR library=UPPER(target2) 

ENDIF • / 

IP target3<y' . » . 

APPEND FROM. TOMPDESIG FOR library-OPPER (targets ) 

EM DIF 
COONT TO AN2\I/rC7r* 

USE l EMTO ZSIG 

OOpy STRUCTTURE TO OEKPSOB 

USB TEMPbUB 

APPEND FROM TBMPDBSIG FOR librarysUPPER(Objectl) 
I P ta ygetSo' • 

APPEWD FROM TEMPDESIG FOR. libran^UPPER(0toject2). 
ENDIF 

IP tcrgetSo* 

•APPEND FftCM TEHPDESJCG FOR aihrary=UPPER(Cbject3) 

EK DIP 
CODNT TO SDBTRACTOT 
SST TALK OFF 

* COMPRESSION SUBBOCnKE A * 
? 'COMPBESSINO'OUERY LIBRARY* 
USE TEMPLIB 



US 
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SORT (K'S27nnr,KimBER TO LIB30RT 

USE IiXSSORT 

COUNT TO lOGEKE 

HEPLACB AIL RF£ND WITH 1 

. fiHScO 

CO )9HILB ROLL 
IF MARKl >r ZDGENB 
PACK 

COUNT TO AOKZQUE 

LOOP 

GO UftHKl 
OTP's 1 

6T0RS El^y TO TESTA 
D ro DSSIGA . 

- 0 ' • . 

DO SNsO T^ST 

OOP 

STORE &7IH2f TO TESIS 
STORE D TO DHSIGB 

IF TESTA s TESTB.A^C3,D^6XGAsC!ESIGB 

DCJP & mp4>l 
LOOP 
ENDIF 
GO'KAHKl 

REPLACE RFEND WTE CUP 
LOOP 

S9DD0.7E6T 
LOOP 

a^ZX) KOLL 

SORT CN KFEND/DiKDMBSl TO TQIPtfARSORT. 
USE ^I^MFTARSORT 

^REPLACE AU START T(ZTH RTH^/XDSEKS'IOOOO 
COGNT TO -T&KPTARCO 

♦ CCMPRfeSSrOW SUBROUnKfi B 

? 'CCtt gRSSS iyG TARGET UBRAR^' 

USE ,TEKPSUB 

SORT ON EKTRy,NUMBBR TO'SOBSORT 
USE SUBSORT 
OOt&IT TO SOBOENS 
REPLACE ALL RFQ^ KTITS I 
MNOa B 1 
6W3«0 . 

-DO VRZiiE SW2sO ROLL 
IF KARKl >s SUBG£3^ 
PACK . 

com?! TO BUNIOUE 

SW2sl 

LOOP 

EKDIF 
GO KARRI . 
DUP - 1 

STOPB, BNIOT TO TBffSk 
STORE D TO DESXGA 
6W B 0 

DO WHILE E^eO TISST 
6XZP 

STORE E^^^^ oo tests 

STORE D TO DSSiCB 

IF TE^ s TES7S,A2®.^IGArCC5IGB 
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too? 

eOMARXl 

RSPIACE KFEfTD WTZV DOP 
KARXl » MARKl+rWP 

LOOP 

SNDDO TEST 
WOP ; 
mSC ROLL 

SORT m RPEND/D, NUMBER TO TEMPSUBSORT 
/USS TOMPSUBSORSP 

*REPIACB hXJL START WKH RPEKD/IDCENE+10000 



*FDS I0N ROUTIME 

7 'SllBITACniro LZERARIES* 

tTSB SDBmCTZOM 

COPY STOUCTURE TO CRUNCKSR 

S&l£CP 2 

DSB mxPSUBSaRT 

USS CRUNCHER 
APP£KD FRCSl O^EHPTAKSCTRT 
COUNT TO SAILOirr 
0 

DO ViHItS .T.. 

HARK e MARK+l' 

IP MRajOBAILOUT 

UXXT 

flflDIP 
-GO 

STOESS EMTHy TO SCANNER 
S5LECT 2 

LOCATE. FOR EKTOfaSCftNNSR 
IP PCXJNDO 

STORS RPET© TO BITl 
STORE RFEND TO BIT2 



SODR S 1/3 TO BTH 

S Tgg 0 TO Bzra 

ENDIP 

REPLACE BGFRBO WUH B1T2 
REPtACE ACTUAL WTIH BIT! 
LOOP 
WXX> 

SELECT 1 

REPLACE Ali* RATTO WITO RFEND/ACTOAL 
? 'DOINQ PIKAL SORT JY RATIO* 

>t^-2J-?*™''°'=^5Fra!Q/0,DESCRI7rOR TO PINAL 
USE PINAL 



Bet balJc off 

SO CASE, 

CASE PTPaO* * 

SET DS71CB TO PRINT 

SET PRINT ON 

EJECT 

CASE PTFsl 

SET ALTERNATE TO -Adenoid .Patent Figures s Subtraction. txt- 
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6BT ALTHRtlM? CN 

STORE V7^{SV5(2))' TO FINTIME 
IP FI? gTIMB<STARTIMg 

. glpR g FZKTZNE fiTU^IHE.TO COMPSEC 
ffliORS CGUPSEC/60 TO CG^ONZN 



•BHT MRRGIN TO 10 

BAY •Lihraiy Subtraction Analysis" Bms 65536 FONT ■Geneva'i274 COLOR 0, 0,0,-1, -X, 

? 
7 
7 

? dateO 
77 » • • 
'?? TOffiO 

7 (Clone nunibefs ' 
77rCTR(IirrnATE,5,0) 
.7? through • • ' 
7? 5T7l(TERMINATS, 6,0} 
7 *Librarie8t * 
7 Targetl 
IP Target2<>* 
77. 

77 Targets 
BNDI? 

IP Targ6t3-<>* 
?? • ' 
7 ? Pa rget3 

7 'Subtraccixjgj 
7 Dbjectl 
lP-0bject2o' 
77-.',.* 
77 CtojectS 

ENKF 

IF QtbjectSo' 
?? ' 
77 Object3 
WDIF . 

-7 ' Designations r 

XF-EmatchsQ .AND. }&natch=0 .A}^. Oinatch=0* .AND, IMATGH=0 
?? 'All' 

EM3ir .. . 
IF Smatchal 
?? *acaet, • 

QTOF 

IF l&oatehsl 
77 'Human, ' 

-ZF onatchsl 
77 'Other ep*' 
STOP 

IF Imatch"! 
?.? 'INCyPE' 

;IP MCOisl 
7 'Sorted W ABUNnMTCE'- 
BNDIP. 
IF AKAL-3 

? 'Arranged ty FUNCTION' 
EKDI? 
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7 '?otal cilones represented! * 

7 'Total clanea ana^zed: ' 

? "TDtftl.cxJirputaticn.tiJWi* 
•?? STR(C0MmrN,5,2) " 
?7 * minutes* * 
? • 

r/d » deaijpatiea f « di3trlbutioTi * - location, r = function s » spebies i « inte 
SCH^^l lYPB 0 -Screen 1- AT 40,2 BlZZ 286,4^2 PIXELS FWr -GenevaVg COLOR o,o,0, 

CASE ANALal 

?? * genes, for a total of » • 
..?? 6TR(AN&r;rOT,4.0) • 
7? ' clones' 

CLOSB D^ITAfiAS&S , 

*USE/EirartGuyiroxBASE+/Mae!fax files : clones, dbf 

'* arranga/ function 
SEP PIUOT' CN 

SCREEN 1 T«^.0 HffiDBXS "Screen r.AT 40,2 SIZE 286,492 PIXELS -FWr -Helvetica" , 268 COLOR 0 
'J * BINDING PJIOTSIMS* 

■f^.it.SiiS^'S^eiSf?^''''^'' 286,4.92 PIXELS ..Helv«lca.',26S COLOK 0 

f^Jal^A^^:?^^ "''^ SIZE. 2B6.492 m=LS ra^ .Helvetica. ,265 COLOR 0 

^l^.^^in^T'^ •Kelv.tiea..265 COLOR 0 

KKEENl WPE-0 H^CnW 'Screen 1* AT-40,'2 SIZE 286,492 fVSlS SWT •GeBe««. 7 Mtne n n n 
list OPP Cieldfl nui«b8r,O,F,2,R,ElTOV,S,I3BSCRIPTOR,8SFaSQ,R^,MS0.I^ 

|CpN i T»E 0 HEADb«: ■^""«»/;*2oISe|.'"= 'Helvetica ',268 pOLOR 0 

f^iarSeSgS^ -Screeh-l.AT 40,2 SIZE 286.492 J-KelS mrp •.HeXvetlca..265 COLOR o' 
fSfSpl ?IS4S?»;i^R"^^ffeicg?^If|=^f^ 0,0.0. 

f^-bSg'iSSS, •Helv«;ica..265 COLOR 0 
SCraS«l WPE 0 HEfinOJG •Sereen 1- AT 40,2 SIZE 286,492 PUfBLS FONT 'Geneva'? CQTiM on n 
ilst OFP fields nuniber.D,P,z,R.Emy.S.DESCRlPiOH.flSraEQ,Rrao,RASo.l ^^Jo^ ' 
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fSil SSeSt^"*" '"^'"^ «^ fflCELS rem .H0lV«tica..365 COLOR 0 

SCRSEN 1 TYPE 0 READING ■Screen !• AT 40 2 stzp adfi* ^ofs tn-v-^ - 

?• 'Xiiaaes and Wioflphataseif- SIZE 386,492 PEELS FCNT •HelvBtica-.JSS COEOR 0 

fS4:XSo'S?SSJff"" . ™T -tovcW.,,,, COLOR 0 

BCREBN 1 WPE 0 HEADING 'Screen 1* AT 40 2 STZE 2Sfi ^o^ btwt ^ 

SCREEN 1 TVPE 0 HEADUTO ^Screeii !• A!P 40 2 ^ttp ^flii i*a'5 «««^ - 

? • • . PROTEIN smic »SralYSSSs? ^ -Helvetica- ,268 ^corm 0 

SCTEEM 1 TifPB 0 HEADIN3 ■Screen !• AT 40 9 rr?r -9Pff am ^'f^^ [ 

T-^rranscription aM NuclIS Sid-bindi^ pS?eJ|f:^^^ '^^^ ^ ■Helvetica- ,2SS COLOR 0 
SCSEEN 1 TYPE 0 HEADING "Screen 1' AT iOStmv lae AO'i btv^. " — >. 

list, OPT £tews n«n4^.DjrsTE^5ys?SEicSi^ific^fS,RS5.;°^r^ 

fSil2^^°.'^°^ -screen 1- AT 40,2 SIZE 286,492 PIXELS "fo^ -Kelv^tica. ,265 COLOR. 0 

SCREEN 1 TSfPE 0 HEADIM3 -Screen !• AT 40,2 SIZE 3flfi Ati> nrvrre • 

lisf OFF fields r^^.O,T.Z.K.E^;s''z^A,S^^^^ ^'"'O. 

f^c^i^"^ '^^'^^ ^' ^.'S 286,492 PIXELS .Helvetic.. .265 COLOR 0 

SCR2SN 1 TSfPE 0 HEADING ■Screen 1* AT 40 2 srrr 5flc .(o-i vrm^ 

liat OFT £iel* r^.^;T.z7:^^s'°t^J^',l^^,^^^^^^ 0.0,0, 

f^StiiTS^^^ "'2 SIZE 286.492 PIXELS FO,^ -Kelvettca. ,-265 COa« 0 

SCRBEK 1 TYPE 0 KEADD3G ■Screen 1" AT in a er^r ^oc >ibo 

list opp fields ^.^^^^'is'^z^c^^''^^ 

SCRE^ 1 TVPE 0 HE^U^IKG -Screen l- at 40..2 St2E 286,492 PlXELS . PO^ "Helvetica- .268 COl^R 0 



^ ' ENZVMES 



?'^Sl«°el^^ ''"^ ^"^ ^ ■Helvetica..26S COLOR 0 

SCREEN 1 TH'B 0 HEtoiNG ■Screen !• AT 40 2 stze ons 40.5 at^^ •-^w 

llBt OFT fields «u„^,B,F.zXs^"sSyi^f^f^.^^^ 0.0.0,. 

f^=L2^an°d*Sg?t:^a^^-^' "'^ -Helvetic... 265 COLOR 0 

SCREEN 1 TYPE 0 HEADING "acreen 1" AT aq 5 CT7r -iotf 

list OTP fields nu„^.0.l?lrr^!s'S3icI!?^?|^^S,RS?S.I^^ 

f^diti^tS6^a;&..^" "'^ ™ ^ •Helvetica..265 COLOR 0 

SCREEN 1 TYPE 0 KBADINd ■ScrBen !• AT 9 ct^tp "ioc jtnn 

list OPP field, r>^r.J>.f:^^^s'':il^^^li^^ 0.0,0, 

f^J SL°1^?*^' "^'^"^ "'^ -Kelvetica-.SSS COIOR 0 

SCREEN 1 TYPE 0 HEADIKO "Screen 1* AT 40 2 €»Trr ^oe t,,^ « 

list 0^ fields ?-^r.D.P:?XH^mi?s?^cJl'^=fy^f^,^- 0.0.0, 

fSo L^^l,;?""™ "'^ •Hel>;etica..26S COU* 0 

SCRE^ 1 T«E 0 HEADING -Screea 1- AT 40.2 SIZS 286,492 PIXELS TO^ -Oeneva-.T COLOR O.O.O; 
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list OPP fields aurolMr,0,P,Z»R.EWm,S, DESCRIPTOR, BO^^BQ,ElPE^©,RM FOR Rs'M' 

fiCREEN 1 TOTE 0. HEADING -Screen !• AT 40,2 SIZE 286,492 P1X2LS PCNT •^f^?lc|-,9S ^SoR o 
7 '^ocleic acid aetaboliEm: *- - * * . 

BCSWl l.^KFB O HEAOI^ "Screen '1* 40,2 SIZE 286,492 PIXELS POTT 'Geneva",? COLOR 0,0,0* 
list. OFF fields nuniber,D,F,Z,R,ENTR^,S, DESCRIPTOR, BGFREQ,RF£»n>, RATIO, FOR Rc*N' ' 

•BCREm'l TOTE 0 KEAimxS 'Screen 1" AT 40,2 size 286,492 PIXELS' PCNT 'Helvetica*, 2 65 COLOR 0 
7 'Lip id inetaboliam: ' 

SCRBBM 1 TOTE 0 HEADINO 'Screen 1- AT 40,2 SIZE 266,492 PIXELS PCNT -Geneva", 7 COLOR 0,0,0, 
list OFF fields liurnber,D,F,2,R.EmRy,S,DESCRIPIOR,BGFREQ,R?END,RATI0, 1 FOR R»*W' 

BCRESN 1 TOTE 0 HEADINQ 'Screeit 1' AT 40,2 SIZE 286,492 PIXELS FOOT "Helvetica ',2 65 COLOR 0 
? 'Other enzynea; ' 

SCREQI I TOTE 0 HKADIN3 -Screen I- AT 40,2 SIZE 286,492 PIXELS FOOT "Geneva ',7* COLOR 0,0,0 
liflt OPP fields ttUniber,D,F,2,R,EB7n(!f,S,nESCRIPTOR,BGFRBQ,RFEWD,RATIO,I FOR R='E' ' 

7 . . ... . . .' 

SCREEN 1 TSfPE 0 HEADI29S •Screen 1" AT 40,2 SIZE 286,492 PIXELS FCKT 'Helvetica-, 2 68 COLOR 0 

7 * MISCELLMJEOOS CAIKORIES' 

7 

SCREQJ 1 TOTE 0 HEADINS -Screen 1- AT 40,2 SIZE 286,492 PIXELS FOOT 'Helvetica', 2 65 c6l0R 0 
7 'Screes responaei' ' * 
SCREEN l TOTE 0 HEW)D«3 'Screen 1- AT 40,2 SIZE 286,492 PIXELS FOOT 'Geneve", 7 COLOR 0,0,0. 
Ixst OFF fields nuniber,D,FvZ,R,EriRy,S,tJB6CRlPTOR,BGFI^,RF22©,miO,i FOR R='H' 

SCREEN 1 TOTE 0 HEADHTO 'Screen 1" AT 40,2 SIZE 286,492 PIXELS FOOT 'Helvetica" , 265 COLOR "0 
7 *Struct^aral: • ' . . u 

SCREEN 1 TOTE 0 HEAD1MC3 -Screen 1- AT 40,2 SIZE 286.492 PIXELS *FTOOT 'Geneva", 7 COLOR 0,0,0, 
list OFF fields. nui«ber,D,F,2,R,£NIRY,S,DESaaPT0R,BGFR^ Rz='K' 

SCREEN 1 OWE 0 KEAD1M3 -Screen 1' AT 40;2 SIZE 286,492 PDCELS FOOT 'Helvetica 2 65 COLOR -0 
7 • 'Other clones: * 

SCREEN 1 TOTE 0 .HEADING "Screen 1- AT 40,2 SIZE 286,492 PIXELS ' FCOT* •Geneva', 7. COLOR 0 0 0 
list OFF fields nuiriber,D,F»2,R,EOTKSf,9,£lBSCRIPTOR,BGFRSO,RPa03,RAT10, 1 FOR R='X' 

SCREEN 1 TOTS 0 HEADING -Screen X' AT 40,2 SIZE 286,492 PDELS FtOT 'Helvetica-, 2 65 COLOR 0 
7 'Clonea'of uolcnown function t ' 

SOraKl WPS 0 HERDING -Screen 1- AT 40,2 size 286,492 PIXELS FOOT 'Geneva', 7 COLOR 0,0,0, 

list OPP fields nurtber,D,?,Z,R,ENIOT,S,CESCRIPTOR,BGFREQ,RFSMD,RATIO,I FOR R«'U' 

SNDCASE 

DO 'Teat print .parg" 
SET PRIOT OFF 
SET DiSVlCE TO SCREE27 
CLOSE DATABASES 
ERASE TEMFLIB.DBF 

E^UISS TEMPDSSIG.DBF . 

Srr WARGIM TO 0 

CLEAR 

LOOP 

ENDDO 
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•^torthern (eiagla) , version 11-25-S4 

close dababae s 

SET T&UC 

SET PRIOT OFF * 

SSr WiCT OFF 

STORE 0 TO Nuirii ' ^ ^^^^ 

STORE 0-TO Zog 
STORE 1 TO Ball 
DO WHILE .T, 
Program. I Northern (single). fot 

♦ Data....: 8/ 8/34 . . * 

• V^sic».t .Pos^BASB+ZKac/ rdviaion i.lO 

1^ Notes. %v.i .Format file Korthem (single) 

SCREEN 1 OVPE 0 HEADING "Screen !• Ar'40 2 stze 5bc ^oo «tv^^ 

9 PaCEtS 115 173 GBP ^Zct em^ .J^ COLOR 0, 0,0,-1,-1,-1 

6 PIXELS 145 89 BAY •DSttipti^sra^?^^ "'"2 C0li«'o,6,0 -1,-1. 



f— COLOR -1* 

~. Northern (single). fntt 
READ 

IP Bails2 
CtBAR . 
s creen 1 off 
'RBIURN 

2P fiobjeoto' 

ifsK'S; "'""'^ ^^try.dbf . 

entry, <abf- 
.I*OCATE POR.LoQJcBBobject 

IF ..NOT.Pocnroo ' 
cuaR 

LOOP 
BRO Wa'jj 

flj^ Entry TO Searchval. 

CLOSE DATABASES 

^ASB ."LooJcup entry, dbf 

SET SJCACT OFF 
SBT EAfOTy OFF 

^fi^'^to^*'''''.'^ •Loo)aip-de3criptor.dbf. 
y^^'LoofcUP ttesoriptor.dbf 

a£AR 
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LOOP 

SKDIP 

BgOW SB 

STORE Batxy TO Searchvkl 
CLOSE lUVTABASSS . ' 
BPASE "Loolcup descriptor.dbf " 
SET EXACT ON • • 

ENOIF • 

IP »j«boO 

CSS "SinartGuytFo}a^5Bf/Mac:Fox files ; clones. dbf^ 
GO NCorib 

.fi TDRg Snkxy. TO Gearchval 

? ■Korthezn azialyala for entry * 
7? Seafdhval 
? . ' 

7 ■E&ccr y to proceed' 

VBOT TO OK ' 

CI£AR 

IP.OT?ER{OK)o'Y< 
screen 1 off 
KEIUFK 
ENDIP 

^ CQK?HES62CN'SUBK00nKS FOk LiJbTdlifyf dbf 

7 'Ccopreasiiig the Liboraries file now.*-..'. 

tXSB "6 toart Guy:PoxBASa^/MaG:Pox f iles: libraries. dbf" 

SET SAPSry OCT* * ' , 

SORT ON libfazy^TO *CompreBBed libraries. dbf* 

* FOR ente red>0 ' 
SET SAFETY ON 

USE 'Coqpressed libraries*. dbf* 
DSLSTE FOR entereds'O ' 
PICK 

OOQNT TO TOT 
SW3aO . 

DO WHILE 5W2sO ROLL 
•IF MMajl >o TOT 

LOOP 

m)iF 

60 MARKl. 
' STORE library TO TESTA 
'SKIP 

STORE Libr ary TO tsstb 

IF TES TA s TKSU'H 
nPT.Tg jg 

EMDIF 

MARKl ^ K2J^+1 ' 
LOOP * 
Et^DDO ROU 

* Korthem an^3yei& 
CLEAR 

7 'Doing the northern new. . < 

SET TALK ON . ■ . . 

USE ■ emart Giy i FoxaASEt /Mac x Fox f lies t clones. ^f"- 

SBT SAFETY OFF 

copy TO "HitB-dbf FOR entryuBeardhyal 
SET SAFETY CN ' 
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• MASTER ANALYSIS 3; VERSION 12-^-94 

* M^ter menu for analysis output 
dJOSS DATABASES 

SE?r TALK OPP 
SOT SAPETir OF? 
CLEAR 

SET DEVICE TO SCREEN 

^ ^^^^ ^ •SinartGuy:Po«aASE+/Mac:fox filesiOutput programsi- 
USE "SmartGly:PQJcaASE^./Mac:fox files : Clones. dbf P^ogra^i 
GO TOP 

glO gE^NUM BSR TO INITIATS 
GO BOTTOM 

STORE NUMBER TO TERMINATE 
grOR E 0 TO ENTIRE 
STORE 0 TO CONDEN 
STORE 0 TO ANAL 
STORE 0 TO EMATCH 
STORE 0 TO HMATCH 
STORE .0 TO QMATCH 
STORE 0 TO IMATCH 
STORE 0 TO XMATCK 
STORE 0 TO PRINTON 
STORE 0 TO PTP 
DO WHILE .T. 

• Program.: Ifeister aioalysis. fmt 

• Date,,..! 12/ 9/54 

• Version.; PoxBASEf AMac, revision 1.10 

• Notes,...; Format file Master analysis 



I ^ SmS 3871 COLOR 0,6. ^l, 125600, ll^l 

a nSSf ^ "Customized Output Menu- STYLE 65536 FCNT "Geneva-, 274 COLOR 0 0 -l -i i 

9 Tims 11M2S OCT Bora SKLE «553t mif^S&il- isS^jsi ?J-I^^SS^=ilJf^°?S^ 

I ISS^ lll'^^A ^ STYLE 65536 TOOT "ChicIgc-,12 PiSe "3*0 Inwtl" S?'E fi^CO 

o 252,146 GET initiate STYLE 0 FONT "Geneva"-, 12 SIZE- 15 70 color on i ^f^-^^^.es CO 

9 PDE£ 270,146 GET tetmiaate STVLS 0 TOOT "Ge-neva-.L siL 15 70 o r 1 r\ 



* EOP: Master analysis. fint 
READ 

IP ANAL«9 

CLEAR 

CLOSE DATABASES 
ERASE TEMPMASTER.D8F 

USE "SmartGuyiFoxBASE+ZMacrfox files : clones. dbf 
SET SAFETY ON 
SCREEN 1 OFF 
RETURN 
ENDIF 
Clear 

? INITIATE 
7 TERMINATE 
7-CONDEN 
? ANAL 
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? ematch 
? Hmatch 
? Csnatch 
? mXKn 
SET TALK ON 

IP EWnREs2 
USE -Uhique libraries :asf« 

REPLACE ALL i WITO ' ' ' 

BROWSE FIELDS i, lihname, library, total, entered AT 0,0 
USE ■Smart:G\y:FoxBASE+/Mae:fox files (clones, dbf" 

copy STRUCTURE TO TQiPLIB 
USE TEMPLIB 
IP EimR£<il 

FROM •amartGuy:PoxBASS+/Mae:fox files ! Clones. dbf • 

IP EWTIREteS 
USB oTteique libraries. dbf • 

COPV TO S5L5 CTED FOR OTPSR (i) o ' Y » 
US3 SELECTED 

STORS R=X:C0U^3T() TO STom 
MARKal 

DO WHILE .T, 

IP KARK>STOPIT 

CLEAR 

EXIT 

SNDI? 

USE SELECTED 
GO MARK 

STORS library TO THISQENE 
? *CO?VTNG ' 
?? THISONE 
USE TSMPLIB 

^'l^in^'^"^'^''''''''"' *il««'Clon«.dbf. FOR Itbraty-TOISONE 
LOO? 
SNDDO 
ENDIP 

USE "SmarzCuyrPoxBASE+ZKacifox files iclonea. dbf • 

COUOT TO STARTOT 

COPV STRUCTORE TO TEMPDBSIG 

USE TEMPDSSIG 

M«^tch=0 .AND. CttratchsO ,AND. IMATCH=0 
APPEND PROM TEMPLIB 
ENDIP 

IF Emacchsl 

APPEND PROM TEMPLIB FOR D='S' 
E^3DI? 

IP Hmarchal 

APPEND PROM TEMPLIB FOR Oa'H* 
E^IF 

IP Omacchal 

APPa^D PROM TEMPLIB FOR Da'O* 
BNDIF 

IP Irnatchsl 

APPEND FROM TEMPLIB FOR D=» I ' .OR.Ds 'X' .OR.D»'N' 
EKDXF 

IP Xmatchnl 

APPEND PROM TEMPLIB FOR Ds'X* 

ENDIP 
COUNT TO ANALTOT 
aet talk off 
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CASE PTP=0 

SET DEV1C3 TO PRIOT 

SET PRINT CN 

EJECT 

CftSB PTPsl 

SET ALTSro^ATE TO "Total function sort.txt" 

•SETT AI/TERNATE TO and O function sort.txt" 

!^ ALTERNATE TO 'Shear Stress KEJVEC 2: Abundance sort.txf 

*SET AE/TSTOJATE TO "Shear Stress HDVEC 2tAbundance con.tJct' 

*SET ALTSmTE TO "Shear Stress HOVEC 2: Function sort.txf 

*SET ALTERNATE TO 'Shear Stress HUVEC 2: Distribution sort.txt" 

♦SET ALTERNATE TO "Shear stress KUVEC l:Clone list.txf 

♦SETT ALKRNATE TO "Shear Stress HUVEC 2iLocation eort.txt*' 

SET ALTERNATE ON 

EKDCASE 



IP PRINT0N=1 

|jU30 SAY -Database Subset Analysis' STiTLE 63336 PONT -Geneva-, 274 COLOR 0, 0,0, -1, -1,-1 

7 
? 

? 

? dateO 
?? • 

?? TIMBO 

? ' Clone- numbers ' 
?? STR( INITIATE, 6,0) 
?? ' thrcagh ' 
.?? STRCTSPMINATS, 6,0) 
7 'Libraries; • 
IP EOTIREsl 
? 'All libraries' 
. ENDIP 
IP EOTIRE=2 

MARlUl 

DO WHILE .T. 

IF MARK>STOPXT 

EXIT 

ENDIP 

USE SELECTED, 
CjO MARK 
? • » 

?7 TOIM(lihname) 
STORE MARK+1 TO MARK 
LOOP 
ENDDO 
ENDIF 

? 'Designaeions! ' 

IP BcnatchaO .AND. HrnatchsO .AND. Qcnatch=0 .AND. IMATCH=0 
?? *Ali' 

a©iF 

IF Bnatchsl 
?? 'E^caet, • 

ENDIP 

IF Kmatch=l 
?7 'Human, V 
ENDIP ■ 

IF Omatch=l > 

?? 'Other sp. ' 

ENDl? 

IF iTOtcihsl 
7? 'INCVTE* ' 
ENDIF 

IP Xiratcb=l 
?? 'EST' 
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£^XF 

IF O0NDEa»Jal 

? 'Condensed format analysie' 

• IP ANALal 

? 'Sorted ty NUMBER' 

SMDIF 

IF mL=2 

? 'Sorted fcy ENIW 

ENDTP 

IP ANAU3 

? 'Arranged by ABUNDANCE' 
IF ANAU4 

? 'Sorted fcy INTEREST' 

ENDIP 

IP. AMAL=5 

7 'Arranged ty LOCATIW 
JENDIF * 

IF ANALs5 

? 'Arranged hy DISTRIBUTION' 
IP ANALa? 

? 'Arranged ty FUNCTION' 
ENDIF 

7 'Total clones represented! ' 

?? STO(STARTOT.6,0) 

? 'Total clones analyzed.* • 

?? STR(ANALTOT,6,0y 

? 

.V,'l library d = designation f = distribution z = location r = function c = cer 

uirT^^ii; 

^ ^ "^-^^^^ 1- 40,2 SIZE 286,492 PIXELS FOOT -Geneva-' 7 COLOR 0,0,0, 

CASE ANAIj=1 

* Bort/number 
SET HEADINQ QM 
IP OCNKNal 

SORT TO TE^4Pl ON ENTRY, NUMBER 
DO -CCMPRSSSroN number. PRG' 
ELSE 

SORT 10 TEtiPl W NUM33R 
USE TEMPI 

number,L,D.P,Z,R,C,SNTRy,S,DESCRIPIOR 
ClSIS dSaIJ^ «™i^'L,D,P,S,R,C,E^3TRY,S,DSSCRIPTOR,I^^ 
ERASE TEMPI. DBF 

CASE ANALs2 

* aort/DESCRIPTOR 
SET HEADING ON 

2S ™ DESCRIPTOR, ENTRY,NUMBER/S for Dn'S' .OR.D='f" .OR.D='0' OR D-»X» OR n=*T. 

c^JPJ^^ ON ENmY,DSSCRIPTOR,NUMEER/S for D» 'E" OR S'H' oLS'O' ot'd^'X' 'S^D^ 
^ ENraY,START/S for D= ^E' .OR.D='H' .OR.D=•6^oS.^):'i^^^^ ^ 

DO "COMPRESSiaN entry. PRO* 
USE TEMPI 

E RASE TQ^l.DBF 
ENDIF 
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CASE MaL=3 

* sort 3by abundance 
SET HSADINQ CN 

SORT TO TEMPI ON ENTRY^NUMBER for D='E' .OR.D='H' .OR.D= '0* .OR.Dx 'X' ,OR.D=' I '* 

DO "CtwpRESsiQN abundance. PRG" 

* sort/interest 
SET HEADING 0^ 
IP CONDENsl 

SORT TO TEMPI ON ENTOYr NUMBER FOR I>0 

DO "CQMPRESSIQN interest , PRG " 

ELSE 

SORT ON I/D,EtTOY TO TE^4P1 FDR I>1 
USB TEMPI 

CtOSE^^nATABASES ^' ^' ^' ^' ^^NTRY, S, DESCRIPTOR, LENGTKi RFEND, INIT, 1 

ERASE TEMPI, DBF 
ENDIF 

CASE ANALsS 

* arrange/location 
SET HEADING ON 
STOR3 4 TO AMPLIFIER 
? 'Nuclear: ' 

?2^™JP?^^'^^^ RFE^'NUMBER,L,D,F.2,R,C,EtmiY,S,DESCRIProR.LH^^ 

DO "Conpression location. prg" 
ELSE 

DO "Noxmal subroutine 1" 
EKDIF 

? 'Cytoplasmic:* 

irOaSS^^'^"^^^^ ^END,tJUlffiER,L.D,P,Z,R,C,ENW,S,DESC3lIPTDR,i^TH,^ 
DO "Ccnpresaion location. prg" 

DO "Normal subroutine 1* 
E^©IP 

? 'Cycbskeleton: ' 

gW^^^^HY.NUMBER FIELDS RFETTO^HUMBER, L,D, F, 2, R,C, ENTRY, S, DESCRIPTOR, LE>TO^^ 

DO ""Ccitpression location. org" 
ELSE 

DO •Normal subroutine 1" . 
£XvDIF 

? 'Cell surface: • 

fS^^l^l^?^^'^'"^®^ ^^^^ 5^2OT,NUMEER,L,D,r,2,R,C,EJTO^Y,S,DESCRlPTOR,L2t«STK,IM^^ 

DO "Compression location. prg" 
ELSE 

DO "Kotmal eubroucine 1" 
ENDIF 

? 'Intracellular roenibrane: ' 

fS^^LJSJ*?^^'^'^^"^ FIELDS R?S^©,NUMBER,L,D,F,2,R,C,ENTRY,S,DESCRIPTOR,LE^rcTH,INIT,I,CQM^CT 
DO ■Conpression location.prg" 

DO "Norroal subroutine 1" 
ENDIP 

? 'Mitochondrial:' 

^^OcSj^^'^'""^ FIELDS RFEND,NUZfflER.L,D,P,2,R,c;EbTOY,S,DESCRIPTOR,LENCn^^ 
DO *CcffrpreBsion location. prg- 
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7 'SecretQdi' 

DO "cauDfeasioa location. pro* 
DO "Normal sutoroutlne 1" 
? 'Otheri* 

f?a£F^''''™^ "^"^ R2'fiND,NUMBER,L,D.P,2,K,C,Emy,S,DESCRlPTOR,r^ 

DO ■Coirprdssion location. pro" 
ELSE 

DO ••Normal subroutine 1» 
? 'XftUoicwni' 

^^SSST^''^'^ ^^END,NL7ffiER,L,D,F,Z,R,C,Er7rRY,S,DESCRI^^^ 

DO "CoBipression location .prg" 
ELSE 

DO •Normal subroutine !• 
ENDIF 

IF C0NDSN=1 

SST DEVICE. TO PRINTER 

SET PRINTER ON ' 

EJECT 

DO •Output heading. prg' 
USB •Ana-lysis location.dbf • 
DO "Create bargraph.prg' 
SET .HEADIKO OFF 

I • FUNCTIONAL CtASS . TOTAL UNIQUE NEW % TOTAL' 

SSe^eJtSSeS 
ERASE TEMP2.DBF 
SET HEADINQ ON , 

'^i^OyiyiToi^B^S^^/mcifox files jTEMFMASTER.dbf 

CASE ANAL=6 

* arrange/aistribution 

SET HEADING ON 

STORE 3 TO AMPLIFIER 

? 'Cell/ciflsue specific distributions' 

gOT^Tl^^NU^ FIEU)S RPEND,NUMBER,L,D,F,Z.R,C,ENTRY,S.DESCRIPTOR^ 

DO "Coopression disnrib.prg" 
ELSE 

DO •Normal subroutine 1" 
ENDIF 

7 'Non-specific diBtributlonj ' 

^^^^^^^^^^'^^ -'laDS KFErro,NMBER,L,D,F,2,R,C,EOTRV,S,DS5CRr^^ 

DO •Coapression diBtrib.prg' 
ELSE 

DO "Normal fiutaroutina !• 
atDIF 

• 7 'Unknown distribution: ' 
IF^CoSS^^'^'™^-^^^^ I™3'NtJMBER,L,D,F,2,R,C,E?Tn^Y,S,DESCRI^^^ 

DO "Ccffrpression distrib.prg- 
ELSE 

DO "Normal subroutine !• 
ENDIF 

IP OQNDENel 

SET DEVICE TO PRINIER 

SET PRINTER ON 
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EJECT 

DO "Output headinsr.prg' 

US3 "Analysis distribution.dbf • 

DO ■Create bargraph.prg" 

S5T HEADING OFF 

? FUNCTIONAL CLASS TOTAL UNIQUE % TOTAL' 

LIST OFF FIELDS P. NAME,CLQNES,GENES, PERCENT, GRAPH 
CLOSE DATABASES 
ERASE TE^2.DBF 
SET HEADING ON 

*USE "SmartGuytFoxBASE+ZMacifox files :TEMPMASTSR.dbf- 
aiDIF 

CASE ANAL=7 

* arrange/function 

SET HEADING ON 

STORE 10 TO AMPLIFIER 

^ ' BINDING PROTEINS' * 

? 'Surface molecules and receptors 

IP^oSeSl^''*^^ I^^™D'NU^ffiHn,L,D,F,Z,R,C,E^Ry,S,^^^^ 

DO •Con^ression functlon-prg" 
T:*T.ef 

DO •Noiroal subroutine 1" 

? ' Calcium- binding proteins: ' 

IF^OcSS^^'^^^ FIELDS RFEND,NU^BER,L,D,F,Z,R,C,ENTRY,S,DESCRIPTCR,LE:qG^ 

DO ■CaTOression function .prg" 
ELSE 

DO 'Normal subrbutine 1" 
ENDIF 

? 'Ligands and effectors i' 

S^CcSESa*"^^'^™^^ I^2^'N«MBER,L,D.F,Z,R,C,mrRy,S,D2SCRIFTOR,LSKGra,INI^ 

DO "Ccmpression f unction. prg" 
ELSE 

DO "Noxmal sxihroutine 1" 
ENDIF 

? 'Other binding proceins J ' 

^^^^^^^^Y,NUMBER FIELDS RFEND,NU^fflER, L,D,F, 2, R,C, ENTRY, S, DESCRIPTOR, Lm?IS.INIT, I, C(^^ 

DO "Compression function •prg" 
ELSE 

DO ■Normal subroutine 1 " 

ENDIP 

•EJECT 

? ' CNCOQENES' 
? . 
? 'General oncogenes:' 

fS^L^^?*™^'*™^^ FIELDS RPEND, NUMBER, L,D,P,Z,R,C, ENTRY, S, DESCRIPTOR, LENE3TH,INI^ 
jlF OQNDENsl 

DO •Conpression iunction.prg" 
ELSE 

DO ■Normal subroutine 1" 



7 'GTP-binding proteins* * 

^^j^^^^Y, NUMBER FIELDS RPEI©, NUMBER, L, D, P,Z, R, C, ENTRY, 3, DESCRIPTOR, LENGTO.INIT, I, C©^ 

DO ••Conpression function. prg' 
ELSE 

DO 'Nonnal subroutine 1' 
ENDIF 

7 'Viral elements I • 

' 6 ^ 



wo 95/20681 



PCT/US95/01160 



DO "CarpresBion function. prg' 

DO "Normal subroutine 1" 
ENDIF 

? 'Kinases and Phosphatases:' 

DO "Cdinpreasion function. prg* 
££SE 

DO "Norml subroutine 1" 
7 'Tumor-related antigens i ' 

?f SS^""'^^-"^ i^'NU>bbr,l,d,f,z:r,c,entos,descriptor,ls^k^ 

DO "Corapresaion function. prg' 
ELSE 

DO "Noznal subroutine !■ 

SNDIF 

*EJECT 

^ ' PROTEIN miHETIC MACH1N2RY PROTEINS' 

? 'Transcription and Nucleic Acid-binding proteins: • 

r^^^oSffi^^'^^^^ ^^^'NUMB3R,L,D,F,2,R.C,SNTOv,s,DES(niIPT0R 

DO •Coo^jression function »prg* 
SLSE 

DO 'Normal subroutine 1" 

EMDIF 

? 'Translation: ' 

IF^CoKS^^'^^ ^"D'N^ER^L,D,F,Z,R,C,EhW,S,DESCRIPTOR,I^^ 

DO 'Con^esaion function. prg" 
ELSE 

DO "Normai subroutine 1* 
5M>IF 

7 *Ribosotnal proteins: ' 

IF^CoSS^^'^'^^^ ^^^^^ P^^^'J«^'L,D,F,Z,R,C,EITOtY,S,DESCRIPTOR,r^ 

DO •Conpressioa function. prg" 
ELSE 

DO "Normal sxibroutine 1' 
ENDIF 

7 'Protein processing! ' 

rP^CaSS^^'^^^"^ ^^^^ R5^'^"UMBSR,L,D,F.2,R,C,Erm^y,s,D 

DO ■Compression function. prg". 
ELSE 

DO 'Normal subroutine 1* 
mjDIF 

' ' EN2XMES' 
? 

? 'Perroproteinsi ' 

IF^COm^^'^^^ ^^^^ ^*J^'l™HER,L.D,F,2,R,C,ENTRy,S,DESCRlPTOR,LENC3^ 
DO "CGinpression function. prg" 



DO •Normal subroutine 1" 
ENDIF 

? 'Proteases and inhibitors:' 

IP^COotS^'^^^^"^ ^^^^ I^P2ND,NlJMBER,L,D.F,Z,R,C,EOTRy,S,DESCRIW^ 
DO "Conpressi n function. prg" 
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DO "Nontal subroutine 1" 

? 'Oxidative phosphorylation:' 

DO "Corrpreaaion function. pro* 
ECSS 

to "Normal subroutine i- 
©DIP 

7 ' Sugar ■jnetabolismi ' 

DO^ Conpression function »prg» 
DO ■Normal subroutine 1* 
? 'Amino acid metabolism: * 

DO "Compression function, prg' 
DO "Normal subroutine !• 
? 'KUcleic acid metabolismi • 

DO "Compression function. pro" 

ELSE . 

DO ^'Normal subroutine i» • 

7 'Lipid metabolism: ' 

S^'^oSS^'''^''^ ™' ^'^^'L'D'F,Z,R,C,E^m.y,S,DESCRIP10R,L^^^^ 

DO "Corr^jression function. pre" 
ELSE 

DO •Normal subroutine !■ 
ENDIP 

? ' Other enzymes i • » 

DO ■Compression function. pro" 
ELSE 

DO 'Normal subroutine 1" 

Q3DI? 

♦EJECT 

J ' MISCELLANEOUS CATETORIES ' 

7 'Stress "response: ' 

?faoSS^''''^^'=^ ^"^^^ '^•»«»fflE^'l''D.F.2.R.C,Dm^,S.DESCaiProR.LEKCmi,nOT.I,C^ 

DO •Coit?>ression functioh.prg" 
ELSS 

DO 'Normal subroutine 1" 
ENDI? 

7 • Structural J ' 

fJ'^JcS:^'''*^'''*' «^^''^HR.L.D.F.Z.R,C,TO.S,OESCRlPTOR,l^wn«,INIT,I,CX^ 

po •Coii5>ression f unction. pr^" 
ELSE 

DO "Noxmal subroutine 1" 
7 'Other clones i ' 

ELsci 
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DO 'Nonnol subroutine 1" 

? * Clones of unknown function:' 

SORT ON ENTRY^NCJMBER PIHLDS RFEND,NUM3E.S, L.D.Ti 2, R,C,2in5lY,S, DESCRIPTOR, 
IF CQNDENsl 

DO "Coirpreflsion function-prg" 
SLSS 

DO "Nonnal subroutine 1" 
ENDIP 

IF CONDEKsl 
BJBCT 

♦SOT DEVICE TO PRINIER 

*SET PRirST ON 

DO 'Output heading .prg" 

USE 'Analyflia function. dbf" 

DO "Create bargraph.prg" 

SET HEADm; OFF . 
*** 

SCREEN 1 TYPE 0 flEADI^« "Screen 1* AT 40; 2 SIZE 296,492 PIXELS K3NT "Geneva M2 COLOR 0,0,0 

; ! TOTAL TOTAL NSW DIST 

? I FUNCTIONAL CLASS CLONES GENES GENES FtBCTIONAL CLASS' 

•LIST OFF FIELDS P, NAME, CLCNES, GENES, NEW, PERCENT, GRAPH, CO^ANY 
LIST OFF FIELDS P,NAME,CL0Sj3SS, GENES, NEW, PERCerr,GRAPK 
CLOSE DATABASES 
ERASE TEMP2,DBF 
SET HEADINS CN 

♦USE '•airartGiiy:PaxBASS+/Macifox filesjTEMPMASTER.dbf* 
ENDIP 

CASE ;^NAL=;B 

DO "Subgroup sumnary 3,prg" 
ENDCASE 

DO "Test print. prg* 

SET PRINT OFF 

SET DEVICE TO SCREEN 

CLOSE DATABASES 

•ERASE TEHPLIB.DBP 

•ERASE TEt4mJM«DBF 

*ERASE TEMPDESIG.CBF 

•erase SELfiCTED.raP 

CLEAR • 

LOOP 

HNDDO 
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* CX)MPJ?2:SS10N SUBRCOTINE FOR ANALYSIS P?0C3RAM$ 

USE TEMPI 

COUNT TO TOT 

REPLACE ALL KFEND WITH 1 

MAm 1 

SW2bO 

DO WtJILE S?^2=0 ROLL 
IF MARKl >B TOT 
PACK 

COUNT TO UNIQOE 

COUWr TO NEWQENES FOR D= 'H' .OR.D= '0' 

SW2sl 

LOOP 

M)IF 
GO MARKl 
DUP s 1 

STORE EWTRY TO TESTA 
SW e 0 

DO WHILE SW=0 TEST 
SKIP 

STORE E^^TRy TO TESTB 

IP TESTA = TESTS 

DELETE 

DUP = DUPrl 

LOOP • 

ENDIF 
GO MARKl. 

REPLACE RFEND WITH DDP 
MARKl « MARKl +DaP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

ENDDO ROLL 
•GO TOP 

STORE Z TO LOG * 

USE • Analysis location. db£" 

LOCATE FOR ZsLOC 

REPLACE CLONES WITH TOT 

REPLACE GENES WITH UNIQOE 

REPLACE NEW WITH NEWGEN2S 

USE TEMPI 

SORT ON RPEND/D TO TEMP2 

USE TEMP2 

?? STR (UNIQUE, 5,0) 

' genes, for a total of • 
?? STR(TOT,5,0) 
?? * .clones' 

V Coincidence' 

list off fields number, RFE^^D,L,D,F,2,R,C,5^TO,S,DSSCRIPTOR,LE^raTH,INIT,I 

*SET PRINT OFF 

CWSE DATAEASES 

ERASE ra^Pl.DBF 

BRASS ra4P2.DBF 

USE TEMPDESIQ 
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* COMPRESSION SOBROOTINS FOR ANALYSIS PROGRAMS 
USE TOWPl 

COUNT TO TOT 

RSPLACC ALL RFEND WITH 1 

MARKl el 

SW2«0 

DO ^niE SU2aO ROLL 
IP MARKl >= TOT 
PACK 

COUOT TO lOTQUE 

6W2=1 

LOOP 

ENDIF 
GO MARXl 
DUP = 1 

STORE ENTRY TO TESTA 
SW « 0 

DO WHILE SW=0 TEST 
SKI? 

STORE ENTRY TO TESTS 
IF TESTA ss raSTB 
DELETE 
DUP ts DUP+1 
LOOP 

• ENDIP 
GO HARKl 

REPLACE RPEND WITH DUP 
MARKl o MARXl+DOT 

UOOP 

EMDDO TEST 
LOOP 

ENDDO ROLL 
•BROWSE 

•*SET PRINTER ON 

SORT ON DATE TO TEMP2 

USE TEMP2 

?? STR (UNIQUE, 4. 0) 

?7 * genes, for a total of 

7? STR(TOT,4,0) 

?7 »' clones* 

? 

? ' V Coincidence ' 

COUNT TO P4 FOR I»4 

IF P4>0 

? STR(P4,3,0} 

?? ' genes with priority = 4 (Seconciary analysis:) • 

list off fields iiinrOMr,RPEt©,L,D,F,2,R",C,ENIRY|S,DSSCRIPTOR,La^GTH,^^^ for 3^4 
SNDI? 

COUNT TO ?3 FOR I«3 

IF P3>0 

? STR(P3,3,0) 

?? ' genes with priority « 3 (Full insert sequence;)' 

list off fields nu2aber.R?END,LiD,F,2,R,C,ENrRY,s.DESCRr?TOR,LSlHT3TK,INIT for 1=3 
? 

ENDIF 

COUbTT TO P2 FOR 1=2. 

IF P2>0 

? STR{P2,3,0) 

77 • genes with priority n 2 (Primary analysis cortplete:)' 

list off fields nuinber.RFEND,L,D,F,Z,R,C,ENTRV,6,DESCRIPT0R,LENGTH,rNrT for 1=2 
E^IF 

COUNT TO Pi FOR It=l 
IP P1>0 
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? STR(P1,3,0) 

l^t f priority = 1 (Primary analysis neededi ) • 

OS nuiItoer,R?Eb©,L,D,P,2,R,C.imV,S,DESCRI?T0R,LEN for Ul 

»SZT PRINT OPP 
CLOSS DATABASES 
ERASE TEMPI. DBF 
ERASE TEMP2.DBF 

USE 'SmartC3iy,PoxBASE+/Mao,fox files.-clones.dbf- 
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♦ COMPRESSION SUBROUTINE FOR ANALYSIS PR0C3RAMS 
USE TEMPI 
COUNT TO TOT 

REPIACE ALL RPEND WITH 1 
MAHKl = 1 

SW2eO 

IX) WHILE SW2aO ROLL 
IP MARKl >= TOT 
PACK 

COUNT TO UNIQUE 
LOOP 

WDir 

GO MARK! 
PUP = 1 

STORE ENTRY TO . TESTA 
SW s 0 

DO WHILE SV7=0 TEST 
SKIP 

STQRS EOTKY TO TESTS 

IF TESTA = TESTS 

HELETE 

DUP = DUP+a 

LOOP 

£2701? 
GO MAPja 

REPLACE RFEND WITH DUP 
MARKl c MARKl^-DUP 
SW=1 
LOOP 

ENDDO TEST 
LOOP 

Q3DD0 ROLL 
*BROWSE 

♦SET PRINTER ON . 
SORT ON NUMBER TO TEMP2 
USE TEMP2 



?? STR (UNIQUE, 4,0) 

?? * genes, for a total of • 

?7 STR(TOT,5,0) 

77 ' Clones' 

14 \- , ^ Coincidence* 

list Otf fields nuirfl3er,REm5,L,D.F,Z,R,C, ENTRY, S,DESCRI?TOR, LENGTH, INI^ 

♦SET PRINT OrP 
CLOSE DATABASES 
ERASE TEMPI. DBF 
ERASE TEMP2.DBF 

USE •SmaxtGuyjFoxBASB+/M&e:fox files : clones. dbf" 
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* COMPRESSION SU3R0OTIN2 FOR ANALYSIS PROGRAMS 

USE TEMPI 

OOUOT TO TOT 

REPLACE ALL RFEND WITH 1 

MARKl B 1 

SW2sO 

DO WHILE SWaaO ROLL 
IP MARKl >o TOT 
PACK 

oou^^ TO u^7Iou^ 

COUNT TO NEW5E^3ES FOR D='H' .OR.D='0» 

6M2el 

LOOP 

£H)IF 
GO MARKl 
DUP » 1 

ST0R3 ENTRY TO TESTA 
SW d 

DO W HILE SW=0 TEST 
SEOP 

STOR E ENT RY TO TEST3 

IF TESTA r TSSTB 

DEUETTE 

DUP = DUP+1 

LOOP 

¥NDIP 
GO MARKl' 

REPLACE RFEND WITH DUP 
MARKl - MARKl+DUP 

SW:sl 

LOOP 

EH3D0 7CST 
LOOP 

ENDDO ROLL 
CO TOP 

STORE R TO FUNC 
USE "Analysis f unction. dbf 
LOCATE FOR P=FUNC 
•REPLACE CLONES WITH TOT 
REPLACE GENES WITH UNIQUE 
REPLACE NEW WITH NEWGENES- 
USE TEMPI 

SORT CN RFEND/D TO TE24P2 

USE TEMB2 

SET HEADINO ON 

?? STR (UNIQUE, 5,0) 

?? ' genes, for a total of ' 

77 STR(Tar,5,0) 

77 • clcnes' 

**• 

7 ' , V Coincidence' 

list off fields rnmiber,RFEl©»L,D,F, 2, R,C,Etmiy,S, DESCRIPTOR, LE2rarri,INlT^ 

♦SCREEN 1 TYPE 0 HEADING •Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT "GenevaM3 COLOR 0,0, 
*liBt off fielda RFEND, S, DESCRIPTOR 

*SET PRINT OFF 
CLOSE DATABASES 
ERASE TEKPl.DHF 
ERASE T^2.DBF 
USE TSUPDESIG 
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* CCMPRBSSION SUBROUTINE FOR ANAI/YSIS PROGRAMS 
USB TaiPl 
COUOT TO TOT 

REPLACE AUj RFEWD WITO 1 

m<n si 

CO WHILE SW2sO ROLL 
IF MARKl >a TOT 
PACK 

COUNT TO UNIQUE 

SW2=1 

LOOP ' 

GO MARKl 
DUP si 

STORE EMTRy TO TESTA 
SW B 0 

DO WHILE SWsO TEST 
SKIP 

STORE ENTRY TO TSSTB 

IF TCSTA o TESTS 

DELETTB 

DUP = DOP+1 

LOOP 

SNDIP 
GO MARKl 

REPLACE RFE:© WITH DUP 
MARKl = MARKl+DUP 
6W=1 
LOOP 

ENDDO raST 
LOOP 

ENDDO ROLL 
GO TOP 

STORE P TO DIST 

USE "Analysis distribution, dbf 
LOCATE FOR P=DIST 
REPLACE CLONES WITH TOT 
REPLACE QEKES WITH UNIQUE 
USE TE«Pl 

«art on rfend/d to TEMP2 

USE TEMP2 

?? OTR(UNIQUB,5,0) 

7? • genes, for a total of • 

?? 6TR(T0T,5,0) 

77 ' Clones* 

^. ' V Coinciaence' 

list off fields nurhber,RPE2ro,L,D,P,Z,R,C,EJmy,S,r]ESCRIETOR,^ 

*SE7r PRINT OFF 
CLOSE Z3ATA3ASES 
BRASS TEMPI. £BF 
.ERASE TE^2.DBF 
USE 7%>fPDESIG 
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♦ COMPRESSION SUBROOTINE FOR ANALYSIS PROGRAMS 

USB fTEMPl 

COUNT TO TOT 

RSPLACS ALL WITK 1 

HARKl o 1 

SW2-0 

CO miLE SW2r:0 ROLL 
IF MARKl >■ TOT 
PACK 

. OOQNT TO UNIQUE 

SW2ol 

LOOP 

ENDI? 
GO MARKl 
DUP » 1 

STORE POTRV TO TESTA 
SW - 0 

DO WHILE SWaO TEST 
SKIP 

STOPS ENTRY TO TESTE 
IF TES TA a TESTS 

DUP.a DUP+1 
LOOP 
EK0IF 
GO MARKl 

REPLACE -RKEND WITH EOT 
MARKl E MAHKl+DUP 

SW:=1 

LOOP 

2©D0 TEST 
LOOP 

aJDDO ROIi ' 

GOTO? 

USE TEMPI 

?? STR (UNIQUE, 5,0) 

77 * genes, for a total of • . 
?? STR{T0T,5,0) 
?? ' clones' 

' ' V Coizicidence' 

last Off fields nurnber,RF^,L,D,P,2,R,C,Et7rRy,S,DESCRIPTOR,LENGra 

*SET PRIOT OFP 
CLOSE DATABASES 
ERASE TEMPI. DBF 
USE TEMPDiESIG 
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Tte^^SF^^^®' SUBROOTIKS FOR ANW^VSIS PROGRAMS 
USE TEMPI 

OOUOT TO IDGENE FOR D»'E' .OR.Db'O'.OR D-»H' op tv,.m» od r^.r,. «^ 

^ FOR D..». .0K.,>..D. :S:£.S. :S:£.S. :o^:£:j : .or..,.v. 

COOOT TO TOT 

REPLACE ALL RPEND WITH 1 

MARKl 1 

SW2=0 

DO WHILE SW2aO ROLL 
IP MARKl >= TOT 
PACK 

OOtOT TO WnOUE 

LOOP 
' ENDIF 
GO MARKl 
DUP B 1 

STORE ENTRY TO TESTA 
SW B 0 

DO WHILE SWaO TEST 
SKIP 

STORE EfJTRY TO TESTS 

IP TEST?^ X3 TESTS 

DELETE' 

DUP = DUP+l 

LOOP 

ENDIF 
GO MARKl 

REPLACE RFEND WITH DUP 
MARKl = MARKl+DUP 

LOOP 

ENDDO TEST 
LOOP 

ENDDO ROI4L 
♦BROWSE 

*SE7r PRINTER ON 

SORT ON RFEND/D, NUMBER TO TEMP2 
USE TEMP2 

II il^^^* ^^^r'a total of • 

?? STR(TOT,5,0) 
7? ' clones' 

CI/5SE DATABASES 
ERASE TEMPI. DBF 
ERASE TEM?2,DBP 

USE •SmartGuy:FoxBASEt/Mac:fox f lies: clones. dbf • 
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* CXMPRSSSION SUBROUTINE FOR ANALYSIS PROGRAMS 
USE TOIPl 

COUNT TO IDGENB FOR D='3* .OR,D='0' •CR,>'H' .OR,D=*N' .OR.D='R' .OR.D='A' 

DELETE FOR Dn 'N* .OR.D='D' .OR.Ihr'A' .OR.D= 'U' ,CR,D='S' .OR.Da'M\OR.D=;'R\OR.D» 'V 

PACK 

COUNT TO TOT 

REPLACE ALL RFEND WITH 1 

MARKl = 1 . * 

SH2sO 

DO WHILE SW2=0 ROLL 
IP MARKl >= TOT 
PACK 

COUNT TO UNIQUE 

85^2=1 

LOOP 

ENDIF 
GO MARKl 
DUP a 1 

STORE ESraV TO TESTA 
SW « 0 

DO WHILE SW=0 TEST 
SFaP 

STORE ENTPRY TO TESTB 

IP TESTA = TESTB 

DELETE 

OJP I* DUP+1 

LOOP - 

ENDIF 
GO MARXl 

REPLACE RPEND WITH DUP 
MARKl a MARKl+DUP 
SW=1 
LOOP 

ENDDO T2ST 
LOOP 

EJ5DD0 ROLL 
♦BROWSE 

♦SET PRIOT3R ON 

SORT ON RFEND/D, NUMBER TO TEMP2 
USB TEMP2 

REPLACE ALL START WITH RFEND/IDOENE* 10000 

?7 Sro (UNIQUE, 5,0) 

7? ' genes, for a tdtal of • 

?? STR(TOT,5,0) 

7? • Clones' 

7 ' Coincidence V V Clones/10000' 

Bet heading off 

SCREEN 1 OTOE 0 HEAbiNG 'Screen 1" AT 40,2 SIZE 286,492 PIXELS FONT 'Geneva-, 7 COLOR 0,0,0, 

list txelds ntunber,RFEND,START,L,D,?,z;R,C,JNTRY,S,DE3CRIPTOR;iiTrT,l 

*SET PRICT OFF • 

CLOSE DATABASES 

ERASE TEMPI. DBF 

ERASE TEMP2.DBP 

USB ■SmartGuy!FoxaASE+/Macifox files i clones. dbf 
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USB TEMPI 
COUOT TO TOT 

?7 • Total of 

?? STR(TOr,4,0) 
?v ' clonee' 
7 

nist off fields nuiiiber,L,D,F,Z,K,CrEMrRY,nESCRIPTOR,LENGTH,RF^ 
list Off fieldfl nuihber»L,D,F,2iR,C,ENTRy,DESCRIPTDR 
CLOSS D;itABASES 
ERASE. TOMPl. DBF 
USE TEMPDSSZO 



7 7 



wo 95/20681 



PCr/DS95/01160 



•Lifescan menu; version 8-7-94 

SETT TALK 0F7 

set device to screen 

CLEAR 

USE *'SinartCSuy:FoxBASE+/Mac:fox f iles: clones. dbf 
STORE LUFDATBO TO Itodate 
GO BOTKM 

STORE REGNO 0 TO clcaieno 
STORE 6 TO Chooser 
DO WHIUS .T. 

* Program. I Idfeseg raenu.fmt 

* Date,.,. I 1/11/95 

* Version.: PoxEASE+/Mac, revision 1.10 

* Notes..,.: Fomat file Lifesaq menu 

f*^Sl^ TF^K^ HEADIN3 -Screen 1" AT 40,2 SIZE 286,492 PIXELS FOOT -Geneva- 2GS COLftTi n n 

t 11^^ ^^'^^^ ^ "^^'^es STn.E 2B479 COLOR 32767,-25600,-17-^6223^1672^^^1^^ ' ' 

I D?^f il^'A^ ^ ^^^'^^'^ 3871 COLOR 0,0,-l,.|l606,-i," ' 

I fS^f il'h^i ^ -LiraSEQ" STi^S 65536 FONT 'Geneva', 536 COLOR 0,0,-1,-1 7135 5BS4 

t pSSi Ititl f^v ^ fP?^ ^ -Genev^M2 COLOR 0,0,-1, :i:7l35,58ir' 

I fSi 11:111 g"25!f?Jl7"^M^^ SS^^;o^2ll§O^J^ -HelveticaMS COLOR 0,0,0, 

fl ^^M^ upcaaCS!" STlfLE 65536 FOOT •GenevaM2 COLOR 0,6,-1-1 -i;-l 

o fSS'^ ^Z^''*^ ""^^^^ clones:- ffm,E 6S536 FOOT "Geneva M2 C^OR 6 6 -1 -1 Zl Zr 
9 PIXELS 45,296 SAY -vl.SO- STi^ 65536 •Geneva-,782 C0l6r O™,-!?:!,^^ 

* EOF: Lifeseq menu. fine 
HEAD 

DO CASE 

CASS Chooser=l 

CASS^^^^f^^^"*"''*^^*^*^ files :Cutput programs t Master analysis 3.prg- 
'CASE^00S^-3°^'^*^"^*^'^°^ files: Output prograins: Subtraction a.prg" 

^Ji^^^<^i^o^<^^-^/^c:fox files;Output programs : Northern (single) oro" 
CASE choosere4 ^ 
USE 'Lihraries.dbf- 
BROWSE 

CAiSH Chooser a 5 

^^^^^^^FoxEASE+/Mac:fox files: Output programs i See individual clone. prg- 

^J,^^^^^'*^<:^^^^'*-^<^'^ox files J Ubraries I Output programs :Menu. pro" 

CASE Cfl00SQr=7 

CLEAR 

SCREEN 1 OFF 

KETVRN 

ENDCASE 

LOOP 
ENDDO 
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ei,30 SAY 'Database Subset Analysis" srVLE 

? 
? 
? 

7 dateO 
?? • • 
77 TIMBO 

? 'Clone numbers ' 

?? STR(INmATE,6,0) 

?? ' through * 

?? STR (TERMINATE, 6,0) 

7 'Libraries: ' 

IP EOTIRE=1 

7 'All libraries* 

ENDIF 

I? ENTIRE=2 

DO WHILE .T. 
IF MARK>STO?IT 
ESaT 

MDIF 

USE SELECTED 
GO KARK 
? • ' 

77 TRIMdibneniQ) 
STORE ^aHK+l TO MARK 
LOOP 
£21DD0 
QIDIF 

? • Designations I • 

IF Ematch=0 .AKD. Hrnatch=0 .AND. Ctnatch=:0 

?? 'All' 

SNDIF 

IP Earaatch^il 
?? 'Exact,' 
S29DIF 

IF Hmatch=l 
?? 'Hmnan, • 
ENDIF 

IF QDoatchsl 
?? 'Other sp. • 

IF COSSI^bI 

? 'Condensed format analysis' 

ENDIP 

IF AMAL»1 

7- 'Sorted by NUMBER' 

m>iF 

IF ANALs2 

7 'Sorted by EOTRY' 

ENDIF 

IP ANALaB 

7 'Arranged ty ABUNDANCE* 
ajDiP 

IF AKAL«4 

? 'Sorted by INTEREST' 

EUDIF 

IP AMAL«S 

7 'Arranged ty LOCATION' 

ENDIF 

IF ANAL«S 

7 'Arranged by DlSTRiBUTiaW' 

ENDIF 

IF ANAL»7 

7 'Arranged by FUNCTION' 



; FONT "Geneva", 274 COLOR 0,0,0,-1,-1,-1 
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ENDIF 

? ''Ototal clones repreaenned; ' 

77 &rR(STftRTOT,6,0) 

? "Total clones analyzedi ' 

77 STO(At3aLTOT,6,0) 

7 

7 
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USE TEHPl 
COONT TO TOT 
?? ' Total of 
?? STO{IOT,4,0) 
?? * clones' 



1*}^?*^ ^t^J^fl^ auittoer,L,D,F,Z,R,C,ENTRY,02SCRlPTOR,LEI«3TH,RFS^ 

liat off fields nuinber,L,D,F,z,R,c, Earner, descriptor 



CLOSE DATABASES 
E3»lSE TEMPI. DBF 
VSB Te^PDSSZG 
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USB TEMPI 
OOCNT TO TOT 

• Total of 
?? STR(TOT,4,0) 
?? ' clones! 

*Ust off fields nu2nber,L,D,P,Z,R,C,ESTRr,DSSCRIITOR,I£NGm,RFEN^ 
list off fields nuinber,ti,D,P, 2, R,C,EWTRV, DESCRIPTOR 
CLOSE OA,TABASSS 
ERASE T^1,DB? 
USE TE^mSSSIG 
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♦Northern (single), version 11-25-94 
close databases 
SET TALK OFF 
SET PRINT 057 
SET EXACT OFF 

fflDRE ' ' TO Eobject 

STORE ' .TO ttobject 

STORE 0 TO NlBrib 
STORE 0 TO 2^ 
STORE 1 TO Bail 
DO WHILE .T. 

* Prooram.: Northern (single) .fmt 

* Date : 8/ 8/94 

* Version.: FoxSASE+ZKao, revision 1.10 

* Notes J Format £ile Northern (single) 

SCREEN 1 TiTPE 0 HEADING "Screen 1- AT 40,2 S12E 286,492 PIXELS FONT "Geneya* 12 COLOR 0 fi 0 
e PIXELS 15,81 TO 46,39? STTLE 28447 COLOR 0,0,--l, ^256C0;-l7-l 
I g^^y ^ 192,422 STOiS 28447 COLOR 0,0,0, -25600,-1,-1 

t Hl'^^ "^^^ ^'^-^ ^5536 FOOT •GenevaM2 COLOR 0,0,0,-1,-1,-1 

I Eobject STUS 0 FOMT 'GenevaM2 SIZ3 15,142 COLOR 0,0,0,-1,-1,-1 

t ^ ^ "GenevaM2 COLOR ofo,6,li:-I -l 

« pobject STYi^ 0 PONT 'QenevaM2 SIZE 15,241 COLOR 0,0,0,-1,-1,-1 

I llk^L^^ "^^^^^ Northern search screen- STYLE 65536 ro>7r •Genwa-,274 OoLoR 0,0,- 

l ^ ^f^^ ^^5^^ -ChicagoM2 PICIUR2 -S^R ConcW/Bail out' SIZE 

.^2f'??,^«"S.^°?® ^5536 FOOT -GenevaMa COLOR 0,0,0,-1,-1-1 

ft on^^co^?S^.^^?^ ^ "GenevaM2 SIZE 15,70 COLOR 0, oi 0, ll, ^1, -1 ' 

PIXELS 80,152 SAir 'Eater any ONE of the follovdng:- STVLE 65536 FONT '•GeneyaM2 COLOR -1, 

* BOP: Northern (single). fmt 
RE2^ 

IF Bail=2 
CLS^ 

screen 1 off 

RSIURN 

ENDIP 

USE ''Srrart:Guy:FoxBASB*/Mac:Fox files :Loo)cup,dbf" 
SET TALK'CN • 

IF Eobjecto' . • 

STCS® UPPER (Eobject) to Eobject 

SOT SAECTSr OFF 

SORT ON Entry TO "Lookwp entry, dbf* 

SET SAFETY ON 

USE "Lookup entry, dbf- 

LCCAIE FOR Loo)c»iEobject 

IF .NOT.FOUNDO 

CLEAR 

LOOP 

ENDIF 

BROMSS 

STORE Entry TO Searchv^il 

CLOSE DATASASES 

ERASE "Loo)cL^'entry-dbf " 



IF Dbbjecto' • 
SET EXACT OFT' 
SET SAFETY OPy 

SORT ON descriptor TO •Loo)cup descriptor. asf 
SOT SAFETY On 

USB •Loo)cup descriptor. dbf 

LOCATE FOR UPPER (TRIM (descriptor )) nUPPERt TRIM (Dobject) ) 

IF .NOT.FCUNDO 

CLEAR 
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LOOP 

ENDZP 

BROWSE 

STORE Ent^ TO Searcbval 

C3J0SE DATABASES 

ERASE ''Loo)cup descriptor. db£" 

SET EXACT ON 

ENDIP 

IP NuaiboO 

USE ■SmartGuy:PoxBASE+/MactFcot files:clones.dbf • 

GO Nunib 

BR0W5B 

STORE Entry TO Searchval 
ENDIP 

CLEAR 

? •Northern analysis for entry ' 

?7 Searchval 

? 

? '£nter Y to proceed* 

WAIT TO OX 

CLEAR 

IP UPPER (OK) o'y 
screen 1 off 
RETURN 
EMDIF 

* COMPRESSION SUBROUTINE FOR Library idbf 
? •Cornpreasing the Libraries file now. . . ' 

USE •SmartGuyiFoxBASE+ZMaciFox f iles: libraries. dbf 
SET SAFETY OFF 

SORT ON library TO "Conipreaeed libraries. dbf" 

* FOR eiitcred>0 
SET SAFSTY ON 

USE 'Cojnpressed libraries. dbf " 

DELETE FOR entered^O 

PACK 

COUNT TO TOT* 

Mmi B 1 

SW2ttO 

CO WHILE SW2=0 ROLL 

IF MARKl >=s TOT 

PACK 

6W2sl 

LOOP 

ENDI? 
GO MARKl 

STORE library TO TESTA 
SKIP 

STORE Library TO TESTE 
IP TESTA = TESTE 
DELETE 
ENDIF 

MARKl - 14ARK1+1 
LOOP 

Q^DO ROLL 

* Northern analysis 
CLEAR 

? 'Doing the northern nov. . - * 
SET TALK ON 

USB ■SmartGayiFoxSASEi'/Mac:? x filesccl nes.dbf" 
SET SAFSTif OiFF 

COPY TO 'Hits .dbf FOR entrya searchval 
SET SAFSTY ON 
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CLOSE DATABASES 
SELECT 1 

USE ■Canpressed libraries, dbf 

STORE KSCOCMrro TO Entries 

SELECT 2 

USE "Hits.aDf" 

Hark::! 

DO WHILE .T. 

SELECT 1 

IP Kar)oEntrieg 

EXIT 

GO MARK 

STORE library TO Jigger 
SELECT 2 

COUNT TO Zog FOR liJsrarysJigger 
SELECT 1 

REPLACE hitg with Zog 

MarksHarkfl 

LOOP 

ENDDO' 

SELECT 1 

BROWSE FIELDS LISRARV,LIBNAME, ENTERED, KITS AT 0,0 
CLEAR 

? *£nter Y to print: ' 
WAIT TO PRINSBT 
IF UF?ER (PRINSET) o ' Y ' 
SET PRINT CN 
CLEAR 
S3BCT- 



?? Searchval 
? DATEO 

? ■ 

f?^pi ^fn2 :?5f^^ 1" AT 40,2 SIZE 286,492 PIXELS FONT 'Geneva",? COLOR 0,0,0, 

LIST OFF FIELDS library, lihnaM, entered, hits 

? 

? 

SSLSCT 2 

cJS'^m^ Na^fflEH,LIBRARY,D,S,?,Z,R,E^TOY,DESCRIPTOR,RFSTART,START,RFEOT 

SET PRINT OTF 
SMDIP 

CLOSE DATABASES 
SET TALK OFF 
CLEAR 

DO "Test print .prg' 
RETURN 
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TABLE 6 



library 

ADENINBOI 

AORENOR01 

AMLBNOTDI 

eMAflNOTDI 

BMARNOT02 

CARDNOT01 

CHAONOT01 

CORNN0TD1 

FiaFW3T01 

FIBRAGT02 

RHfvWTOl 

Fl2fWGT01 

R9RNGTQ2 

FtBRNOTOT 

HMC1NOT01 

HUVELPBOl 

HUVENOB01 

HUVESTBOi 

HYPONOB01 

KI0NNOT01 

UINGNOT01 
MUSCWOTOl 
OVIONO&OI 
PANCNOT01 
PmJNOflOl 

prruNOToi 

PLACNOB01 

aNTNOT02 

SPLNFET01 

SPLNNOT02 

STOMNOT01 

&YNORAB01 

TBLVNOTD1 

TCSTNOTOI 

THP1NOB01 

THP1PEB01 

THP1PLB01 

U937NOT01 



Itbname 
Inflamed adenoid 
Adrenal gland (0 
Adrenal gland (T) 
AML blast eeUs (T) 
Bona marrow 
Bone marrow (i) 
Cardiac muscle (T) 
Chtn. hamslar ovary 
Corneal stroma 
FibroWaat, AT 5 
Fibroblast, AT 30 
Fibroblast AT 
Fibroblast, uv 5 
Fibroblast, uv 30 
Fibroblast 
Fbroblast, normal 
Mast call line HMC*1 
HUVECtFN,TNF,lPS 
HUVHC conrrol 
HUVEO shear stress 
Hypothelamua 
Kidney (T) 
Uver (7) 
Lung (T) 

Skalaiai musdo (T) 
Oviduct 

Pancreas, normal 
Pltuilaiy (r) 
Pituitary (T) 
Placanta 

6mat] intestine (T) 
Splaenrliver, iatol 
Spleen (7) 
Stomach 
Rheum, synovium 
T B lymphoblast 
Testis CT) 
THP-1 control 
THP phorbol 
THP-l phortiot LPS 
Ud37, monocytic teuk 



number library 

2304 U837NOT0t 

3240 HMC1NOT01 

3269 HMC1NOT01 

4693 HMC1NOT01 

S9S9 HMCINOTOI 

9139 HMCINOTOI 



dsfzr entry 
E H C C T HUMEFIB . 
E H C C T HUMEFIB 
E H C C T HUMEFIB 
E H C C T HUMEFIB 
E H C C T HUMEFIB 
E H C C T HUMEFIB 



descriptor 
Elongation lador 1'beta 
Elongation (actor 1-beta 
Elongation factor i-bata 
Elongation factor i-beta 
Elongation iacior i-beia 
Elongation factor i-bete 



rfstanetari 


rfen d 


D' 0 


773 


0 370 


773 


0. 371 


773 


0 470 


773 


0 327 


773 


0 375 


773 
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WHAT IS CLAIMED IS.' 

1. A method of analyzing a specimen containing gene 
transcripts, said method comprising the steps of: 

(a) producing a library of biological sequences; 
5 (b) generating a set of transcript sequences, where 

each of the transcript sequences in said set is indicative 
of a different one of the biological sequences of the 
library; 

(c) processing the transcript sequences in a 

10 programmed computer in which a database of reference 
transcript sequences indicative of reference biological 
sequences is stored, to generate an identified sequence 
value for each of the transcript sequences, where each said 
identified sequence value is indicative of a sequence 

15 annotation and a degree of match between one of the 

transcript sequences and at least one of the reference 
transcript sequences; and 

(d) processing each said identified sequence value to 
generate final data values indicative of a number of times 
each identified sequence value is present in the library. 



20 



2. The method of claim l, wherein step (a) includes 
the steps of: 

obtaining a mixture of mRNA; 
making cDNA copies of the mRNA; 
25 isolating a representative population of clones 

transfected with the cDNA and producing therefrom the 
library of biological sequences. 

3. The method of claim 1, wherein the biological 
sequences are cDNA sequences. 

^° ^- method of claim i, wherein the biological 

sequences are RNA sequences. 

5. The method of claim i, wherein the biological 
sequences are protein sequences. 
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6. The method of claim 1, wherein a first value of 
said degree of match is indicative of an exact match, and a 
second value of said degree of match is indicative of a 
non-exact match. 

5 7. A method of comparing two specimens containing 

gene transcripts, said method comprising: 

(a) analyzing a first specimen according to the 
method of claim 1; 

(b) producing a second library of biological 
10 sequences; 

(c) generating a second set of transcript sequences, 
where each of the transcript sequences in said second set 
is indicative of a different one of the biological 
sequences of the second library; 

■ processing the second set of transcript sequences 

in said programmed computer to generate a second set of 
identified sequence values known as further identified 
sequence values, where each of the further identified 
sequence values is indicative of a sequence annotation and 
20 a degree of match between one of the biological sequences 
of the second library and at least one of the reference 
sequences ; 

(e) processing each said further identified sequence 
value to generate further final data values indicative of a 

25 number of times each further identified sequence value is 
present in the second library; and 

(f) processing the final data values from the first 
specimen and the further identified sequence values from 
the second specimen to generate ratios of transcript 

30 sequences, each of said ratio values indicative of 

differences in numbers of gene transcripts between the two 
specimens. 

8. A method of quantifying relative abundance of mRNA 
in a biological specimen, said method comprising the steps 
35 of; 

(a) isolating a population of mRNA transcripts from 
the biological specimen; 



88 



"^ossnoesi pct/us95/oii60 

(b) identifying genes from which the mRNA was 
transcribed by a sequence-specific method; 

(c) determining numbers of mRNA transcripts 
corresponding to each of the genes; and 

5 (d) using the mRNA transcript numbers to determine 

the relative abundance of mRNA transcripts within the 
population of mRNA transcripts • 

9. A diagnostic method which comprises producing a 
gene transcript image, said method comprising the steps of 

(^) isolating a population of mRNA transcripts from 
biological specimen; 

(b) identifying genes from which the mRNA was 
transcribed by a sequence-specific method; 

(c) determining numbers of mRNA transcripts 
15 corresponding to each of the genes; and 

. (d) using the mRNA transcript numbers to determine 
the relative abundance of mRNA transcripts within the 
population of mRNA transcripts, where data determining the 
relative abundance values of mRNA transcripts is the gene 
20 transcript image of the biological specimen. 

10. The method of claim 9, further comprising: 

(e) providing a set of standard normal and diseased 
gene transcript images; and 

(f) comparing the gene transcript image of the 

25 biological specimen with the gene transcript images of step 
(e) to identify at least one of the standard gene 
transcript images which most closely approximate the gene 
transcript image of the biological specimen. 

11. The method of claim 9, wherein the biological 
30 specimen is biopsy tissue, sputum, blood or urine. 

12. A method of producing a gene transcript image, 
said method comprising the steps of 

(a) obtaining a mixture of mRNA; 

(b) making cDNA copies of the mRNA; 
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(c) inserting the cDNA into a suitable vector and 
using said vector to transfect suitable host strain cells 
which are plated out and permitted to grow into clones, 
each clone representing a unique mRNA; 
5 isolating a representative population of 

recombinant clones; 

(e) identifying amplified cDNAs from each clone in 
the population by a sequence-specific method which 
identifies gene from which the unique mRNA was transcribed; 

(f) determining a number of times each gene is 
represented within the population of clones as an 
indication of relative abundance; and 

(g) listing the genes and their relative abundance in 
order of abundance, thereby producing the gene transcript 

15 image. 

13. The method of claim 12, also including the step 
of diagnosing disease by: 

repeating steps (a) through (g) on biological 
specimens from random sample of normal and diseased humans, 
encompassing a variety of diseases, to produce reference 
sets of normal and diseased gene transcript images; 

obtaining a test specimen from a human, and producing 
a test gene transcript image by performing steps (a) 
through (g) on said test specimen; 
25 comparing the test gene transcript image with the 

reference sets of gene transcript images; and 

identifying at least one of the reference gene 
transcript images which most closely approximates the test 
gene transcript image. 

14. A computer system for analyzing a library of 
biological sequences, said system including: 

means for receiving a set of transcript sequences, 
where each of the transcript sequences is indicative of a 
different one of the biological sequences of the library; 
35 and 

means for processing the transcript sequences in the 
computer system in which a database of reference transcript 
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sequences indicative of reference biological sequences is 
stored, wherein the computer is programmed with software 
for generating an identified sequence value for each of the. 
transcript sequences, where each said identified sequence 
5 value is indicative of a sequence annotation and a degree 
of match between a different one of the biological 
sequences of the library and at least one of the reference 
transcript sequences, and for processing each said 
identified sequence value to generate final data values 
10 indicative of a number of times each identified sequence 
value is present in the library. 



15. The system of claim 14, also including: 
library generation means for producing the library of 

biological sequences and generating said set of transcript 
15 sequences from said library. 

16. The system of claim 15, wherein the library 
generation means includes: 

means for obtaining a mixture of mRNA; 

means for making cDNA copies of the mRNA; 

20 means for inserting the cDNA copies into cells and 

permitting the cells to grow into clones; 

means for isolating a representative population of the 

clones and producing therefrom the library of biological 
sequences. 
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Axnp sequence toOowing Ser*°* end ocojs wItNn 
tfBitaTialn erf Axllp tfgt^xws hcyn otogy with hOe 
To deiete the compteta STE23 aequenoe And 
create the siB23d:HJH43iTutat)cn. potymeiasetftain 
reacticn {PCfl) primer* (5'-TCGGAAGACCTCAT- 
IC T 1 U C I C ATTTT WATTGCTO TGT/GATTG- 
TACTGAGAGTGCAC-3*: and S'-GCTACA^ACAGC- 
GTCQACTTGAATGCCXX5GACATCTTCQACTGT. 
GCGGTATrrCACA0CG-3T woro used to arnpfify 
the URA3 sequena of pRS3i6. and the reaction 
pfoduci was transtormed Wo yeast for on»^ gene 
replacement (R. Rothsteirv. Methods Emymd. W, 
281 (199lfl.Tocfeatethe«/7^-,i-fU?muationcan. 
tained on p114, a 5iW<t) Sef i (rognrwrt from p^j 
was cfcned rto pUCl9. end ^ intema* 4iM{b Hpa 
l-Xho I tragmertt W8S replaoed wftfi a La/? fragmriL 
To construct the $ts2St:XBJ2 al»e»e (a deleticn oor- 
>espcndno to 931 amrio acids) cantod on pl53. a 
LBJ2 tragmert was isad to reptace the 2J^ Pni 
l-Edl 36 1 fragment d S7E23. wtiicft ooara within a 
W-Kb Hnd B>-Bgr fi genomic frso'neni carried on 
pSP72 (Promege). To create YEpMFAJ, a 1.6^ 
Bam HI tragmeni containing MFA 1, *om pKK16 (K, 
Kuchler. R E. Stsne. J, Thomer. EMBOJ. 8, 3973 
(1 9B9D. was figated into tt« B»n HI ^e of YEp35l p. 
E HI. A. M. Mym, T.J. Koemer, A Tzagotoff, Keaif 
Z 163 0960]. 

24. J. Chant and L Herstowttz, CeO 65, 1203 0991). 

25. B. W. Matthews. Chan. Res. 21. 333 (1986). 

26. K. Kuct^. H. G. DoMrr^ J, TTx>mer; J. CaJ Stot 
120. 1203 09S3); R. Koing and C. P. HoOenbefg. 
MO J. 13. 3261 (1994J; C. Berkowor. D. Loayza, 
S. Mktiaefis, A«tf. Bicy. Ctf 5. 1 1 85 (1994X 

27. A. Bender and J. R. Prr^, flnoc. Natt Acad. Sci. . 
as A 86. 6976 (1 886): J. Chant. K. Cofrado, J. R. 
Pringle. I. Herskowitz. Csf 65. I2l3 <1991); S. 
Powers, E Gonzales. T. Christensen. J. Cutwrt, D. 
Bfoek. ajto.. p. 1225: K. O. Perk. J. Chant. I: Her- 
skowltt. WarufO 365. 269 (1993); J- Chant. TifBndi 

Genet 10, 328 (1994); and J. R Pringle. J. 

CeOBiot. 129. 751 (1995); J. Chant. M. M«chke. E 
MitcheO. L Herskowttz. J. R Pringte. P. 767. 

28. a F. Sprague Jt.. Methods. £ru)i7)0t 194, 77 
(1991J. 

29. Sngie^ener abbreviations for the amino acid resi- 
duasareasfoaow3:A.AIa:C.Cy9: 0. Asp; E GJu: F, 
Phe: & Gfy; H. Hs: t. Qe: K. Lys; L. Uu: M. Met: N. 
Asn:P, Pw; 0. Ghjfl. Aig; S, Sff; T.Thr; V, Val; W, 
Trp:andY.Tyr. 

30. A W303 1A dwTvatim. Sr262S (VCATa kUZ-S, 

/iES3^">US7 -KISS), was the perani strain tor the rrxjtaft 
search. SY2625 derMabws far the meting assays, ae- 
crated phemmone assays, and the piJse<tBse o9»-. 
immts ndUded tw falow^ stains: y49 (sfe22-T). 
Y115 frnte;A:.-Lajt3. Y142 >)rf;.vURA3). n73 
(^AidSJZ^ Y220 (ax^rrUW 5iB23&rtR43). Y221 
fste23A:.im3). Y231 (a^/d.vl£l£ «tB23A:mG), 
and rZ33 (sra23A.-l£L^ M^Ta derivativas d 
SY2625 kxAxM the (oQowIng strains: Y199 
(SY2625 made MATo), Y278 (5(e22-l), Yl95 
(mAiJ^ta/?). Y196 ifijd1t::L£UZ^ end Y197 
(uf 7;:Un43). The £G 1 23 (^MTa tou2 ura3 VpJcani 
h's4i genetic background was used to create a set of 
strains for enatysts of bud stte selection. EQl23dft- 
rttfsttves kiduded the fotowmg strains: Y175 
(ax/IA.7LaJC2), Y223 (Ax/f:.-UnA3). Y234 tftaZSA:: 
IHjB). and Y272 (axfJA-vLfLS s/e234.Mfl/2). 
AAATo derivatives of EG123 IrvAjded the tolowing 
strains: Y214 (EG123 made MATa) and Y2d3 
lfljtnt.':L£V2U AS strains were generated by meens 
cf standtard genetic or mdeaiar methods Irrvotving 
the approprtata constructs In particutar. the u/T 
sfe23 double mutant strains were creeled by cross- 
ing of the appropriate A«47e Sfa23 and MATa udl 
mutants, followed by sporuiatlon of the resuftant dip- 
loid and isolation of the doutile mutant from nonpe- 
rental tS-type tetrads. Gene dtenpUons were con- 
firmed with either PGR a Southern (DNA) analysis. 
31. p129 b a YEp3S2 fJ. E HI, A. M Myers. T. J. Ko- 
emff. A. Tzagdoff. Yeast 2, 163 (I686i)l plasmid con- 
taining a 5.5^ 5d 1 ts^mvit of pW. P151 was 
derived *om pi 29 by rsenion of • fnker at the Bq^ II 
site wtthh AXL r . wNch led to 01 ln-*ame insenion of 
the hemaggUlnln(IHA) epitope (D0yP^lMWgC2^ ■ 
between amho adds 854 and 855 of the A>a.f prod- 



uct pC225 Is a KS-t- (Stra:agene)p(asmid containing 
a 0.5-1* Bam W-Sst I fragment from pAKLf. Substi- 
tution rruations of the proposed actrve sle of Axil p 
were created with the use of pC225 ffid riie-spedllc 
nwtagenesis rMJfvfrig appropriate synlhelic ofgonu- 
cieotides ipxtUHeSA, S'-GTCCTCACAAAGCGCT- 
GCCAAACCGGC-3': axl1-€7lA, S'-AAGAATCAT- 
GTGCGCACAAAGGTGCGC-3'; and wdl-eriD, 5'- 
AAGAATCATGTGATCACAAAGGTGCGWT The 
mutations were conftmad by sequence malysis. Af- 
ter rrutagenesis. the 0.4-kb Bam HHwtsc I fragmer* 
from the nvtagenczed pC225 ptasmlds was trans- 
ferred into pA>a.T lOCTeateasetofpflSSieptesnKfe 
carrying diftorent AXLT afleles. p124 (aj<f;-H68A). 
pi 30 iflxn-SJlAi, and pi 32 03sf}'E7lO^, Smiwfy. a 
s«« of HA-lagged atetae carried on YEp3S2 wwB OB- 
atedrfler replacement of the pl51 Bot HMtec I 
fragment to generate pi 61 (a)rf;-£7lA). p162 (axfJ- 



^ Dav is. T. Favero, C. de Hoog. and S. Kkn fc^ 
comments on the manuscript Supported by a 
grant toCB. from the Natml Sciems «id 
neering Research Coux:^ of C:enada. Support lor 
M.N-A. was from a Caffomb Tobacco-Rtfated Dis- 
ease Researcti Program postdoctoni tadowship 
(4FT.00a3). 

22 Ajne 1995; accepted 21 August 1995 



Quantitative Monitoring of Gene Expression 
Patterns with a Complementary DNA Microarray 

Mark Schena,* Dari Shalon.*t Ronald W. Davis, 
Patrick O. Brown^ 

A high-capacity system was developed to monitor the expression of many genes In 
parallel. Microan-ays prepared by high-speed robotic printing of complementary DNAs on 
glass were used for quantitative expression measurements of the corresponding genes. 
Because of the small format and high density of the arrays, hybridization volumes of 2 
microliters could be used that enabled detection of rare transcripts In probe mixtures 
derived from 2 micrograms of total cellular messenger RNA. Differential expression 
measurements of 45 Arabidopsis genes were made by means of simultaneous two-color 
fluorescence hybridization. 



The temporal, developmental, topographi- 
cal, histological, and physiological pancms 
in which a gene is expressed provide clues to 
its biological role The large and expanding 
database of complementary DNA (cDNA) 
sequences from many organisms (1) presents 
the opportunity of defining these patterns at 
the level of the whole genome. 

For these studies, we used the small flow- 
ering plant Arabidopsis ihaiiana as a model 
organism. Arabidopsis possesses many ad- 
vantages for gene expression analysis, in- 
cluding the fact that it has the smallest 
genome of any^ higher eukaryotc examined 
to date (2). Forty-five cloned Arabidopsis 
cDNAf (Table 1), including 14 complete 
sequences and 31 expressed sequence tags 
(ESTs), were used as gene-specific targets. 
We obtained the ESTs by selecting cDNA 
clones at random from an Arabidopsis 
cDNA library. Sequence analysis revealed 
that 28 of the 31 ESTs matched sequences 
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Stanford. Ca 94306. USA. 
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in the database (Table I). Three additional 
cDNAs from other organisms served as con- 
trols in the experiments. 

The 48 cDNAs, averaging -1.0 kb. 
were amplified with the polymerase chain 
reaction (PCR) and deposited into indi- 
vidual wells of a 96-well microtiter plate. 
Each sample was duplicated in two adja- 
cent wells to allow the reproducibility of 
the arraying and hybridization process to 
be tested. Samples from the microtiter 
plate were printed onto glass microscope 
slides in an area measuring 3.5 mm by 5.5 
mm with the use of a high-speed arraying 
machine (3). The arrays were processed by 
chemical and heat treatment to attach the 
DNA sequences to the glass surface and 
denature them (3). Three arrays, printed 
in a single lot, were used for the experi- 
ments here. A single microtiter plate of 
PCR products provides sufficient material 
to print at least 500 arrays. 

Ruorescent probes were prepared from 
total Arabidopsis mRNA (4) by a single 
round of reverse trarwrription (5). The Ara- 
bidopsis mRNA was supplemented with hu- 
man acetylcholine receptor (AChR) mRNA 
at a dilution of 1 : 10,000 (w/w) before cDNA 
synthesis, to provide an internal starulard for 
calibration (5). The resulting fluorcsccntly 
labeled cDNA mixture was hybridized to an 
array at high stringerKy (6) and scanned 
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with a laser (3). A high-sensidvity scan gave 
signals that saturated the detector at nearly 
all of the Aroiuiofms target sites (Fig. lA). 
Calibration relative to die AChR mRhlA 
standard (Fig. lA) established a sensitivity 
limit of - 1 :50,000. No detectable hybridiza- 
tion was observed to either the rat glucocor- 
ticoid receptor (Fig. lA) or the yeast TRP4 
(Fig. lA) targets even at the highest scan- 
ning setuittvity. A moderate-sensitivity scan 



/\ High scnsitivily 

I 2 3 4 5 C r e 9 to II 12 

a i . <^ s.' X- * ■ : ; » 



>1:3,000 1:10,000 1:50,000 >1:200 

Expression level (w/w) 



of the same array allowed linear detection of 
the more abundant transcripts (Fig, IB). 
Quantitation of both scans revealed a range 
of expression levels spanning three orden of 
magnitude for the 45 genes tested (Table 2). 
RNA blocs (7) for several genes (Fig. 2) 
corroborated the expression levek measured 
with the microarray to wid\in a factor of 5 
(Table 2). 

Differential gene expression was invesri- 
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gated with a simultaneous, two-color hy^ 
bridization scheme, which sensed to mini- 
mize experimental variation itihereni in the 
comparison of independent hybriditations. 
Fluorescent probes were prepared from two 
mRNA sources with the use of reverse tran- 
scriptase in the presence of fluorescein- and 
lissamine-labeled nucleotide analogs, re- 
spectively (5). The two probes were then 
mixed together in equal proportions, hy- . 
bridized to a single array, and scanned sep- 
arately for fluorescein and lissamine emis- 
sion after independent excitation of the two 
fluorophores (3), 

To test whether overcxprcssion of a sin- 
gle gene could be detected in a pool of total 
Arabidopsis mRNA. we used a microarray to 
analyze a transgenic line overcxpressing the 
single transcription factor HAT4 (8). Fluo- 
rescent probes representing mRNA from 
wild-type and HAT^-cransgenic plants were 
labeled with fluorescein and lissamine, re- 
spectively; the two probes were then mixed 
and hybridized to a single array. An intense 
hybridization signal was observed at the 
position of the HAT4 cDNA in the Ussa- 
mine-specific scan (Fig. ID), but not in the 
fluorescein-specific scan of the same array 
(Fig. IC). Calibration widi AChR mRNA 
added to the fluorescein and lissamine 
cDNA synthesis reactions at dilutions of 
1:10,000 (Fig. IC) and 1:100 (Fig. ID), 
respectively, revealed a 50-fold elevation of 
HAT4 mRNA in the traiugenic line rela- 
tive to its abundarKe in wild-rype plants 
(Table 2). This magnitude of HAT4 over- 
expression matched that inferred from the 
Northern (RNA) analysis within a factor of 
2 (Fig. 2 and Table 2). Expression of all the 
other genes monitored on the array differed 
by less than a factor of 5 between HAT4- 
trarugenic and wild-typc plants (Fig I, C 
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Fig. 1 . Gene expression monrtored wfth the use o> cDNA mjcrparrays. Fluoresoert scans fepresented in 

psouctocotor correspond to hytxidization htensl^ 

with the use d known concentfalions of human AChR mRNA in ff^^ 

tetters on the axes rnark the posmondeacticW^ (A) Kgh-sensitMiyn 

wrth ftuorescein^abelBd cDNA derived from wild-type plants. (B) Same array as m (Al but scanned at 
moderate sansftMiy. (C and D) A singlo array was probed with a 1 :1 mbrture of ftuoresceffv labeled cONA 
from wBd-type piants and Kssamine-labeted cDNA from HAT4. transgenic plants. The single anay was 
then scanned 5uocess»ve»y lo detect the fluorescein fkjorescence corresponcSng to mRNA ^ wOd-tyoe 
plants (q and the fissamine fluorescenoe corresponding to mRNA from HAT4.trans9er^c plants (D) (E 
and F) A s^^^eanay^was probed with a 1:1 mixture of ftuorescein-labeled cDNA from root tissue arxJ 
fesamtfie-bbeled cONA from leaf tissue. The single anay was then scanned successivety lo delect the 
fluorescen fUye^ corresponcfing to mRNAs expressed m roots (E) and the fissamiri fluorescence 
conespondmg to mRNAs expressed in teaves (F). 
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Rfl. 2. Gene expression rrxmitored with RNA 
(Northern) blot analysis. Designated arrwirts of 
mRNA from wild-type and HAia-transgenic 
plants were spotted onto nylon membrar^s and 
probed with the cONAs indicated. Purified hunan 
AChR mRNA was used for calibration. • 
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and D» and Tabic 2). Hybhduation of flu- 
oresce in-labeled glucocorticoid receptor 
cDNA (Fig. IC) and lissamine-labeled 
TRP4 cDNA (Fig. ID) verified the pres- 
ence of the negative control targets and the 
lack of optical cross ulk between the two 
fluorophoies. 

To explore a more complex alteration in 
expression panems, we performed a second 
cwo-color hybridization experiment with 
fluorescein- and lissamine-labeled probes 
prepared from root and leaf mRNA, respec- 
tively. The scanning sensitivities for the 
two fluorophores were normalized by 
matching the signals resulting from AChR 



mRNA, which was added to both cDNA 
synthesis reaaions at a dilution of l:I(XX) 
(Fig. 1, E and F), A comparison of the scaiu 
revealed widespread differences in gene ex- 
pression between root and leaf tissue (Fig. 1, 
E and F). The mRNA from the light-regu- 
lated CABl gene was -500-fold more abun- 
dant in leaf (Fig. IF) than in root tissue 
(Fig. IE). The expression of 26 other genes 
differed between root and leaf tissue by 
more than a fiactor of 5 (Fig. 1. E and F). 

The HAT^-transgenic line we cxamiried 
has elongated hypocotyls, early flowering, 
poor germination, and altered pigmentation 
(8). Although changes in expression were 



Tabte 1. Sequences contained on the cONAmtcroarTBy. Shown is th^ the known or outative 

funrtion.andthe8CcessionnumberoleachcDNA 1 )^ Esxf 

in this study matched a sequence n the datat)ase. NADH rediicid 

dinucteotkJe; ATPase. ad«>sine triphosphatasS. g^K^^ ncotinan.de aden^ 



Position 



CONA 



Function 



Accession 
number 



81.2 
a3, 4 
as. 6 
a7.6 
a9, 10 
all. 12 
b1.2 
63.4 
65.6 
b7.8 
b9. 10 
b11. 12 
CI. 2 

c3,4 

C5.6 

C7,8 

c9. 10 

C11. 12 

d1.2 

d3.4 

d5.6 

d7.8 

d9. 10 

d11. 12 

el. 2 

e3,4 

e5.6 

e7.6 

eg. 10 

ell. 12 

fl.2 

13,4 

t5,6 

f7,8 
f9. 10 
111,12 
91.2 
g3.4 

95.6 
97.8 

g9. 10 

911. 12 
hi. 2 
h3.4 
h5.6 
h7.8 
h9, 10 
h11. 12 



AChR 

EST3 

ESTB 

AAC1 

EST12 

EST13 

CABi 

EST17 

GA4 

EST19 

GBf-1 

EST23 

EST29 

GBf'2 

ES734 

EST35 

EST41 

EST42 

EST45 

H47T 

eST46 

EST49 

HAT2 

HAT4 

EST50 

HATS 

ES751 

HAT22 

EST52 

EST59 

KNAT1 

^60 

EST69 

PPH1 

EST70 

EST75 

EST78 

ROC1 

EST62 

Esrea 

EST84 

EST91 

EST96 

SARI 

EST100 

EST103 

7TiP4 



Human AChR 
Aclin 

NADH dehydrogenase 

AcUnI 

Unknown 

Aclin 

Chlorophyll a/b tmdrg 
PhosphogJycerate kinase 
GbbereHic add biosynthesis 

G-box tjinding factor 1 
Elongation factor 
Aldolase 

G-box binding factor 2 
Chloroplast protease 
Unknown 
Catalase 

Rat glucocorticoid receptor 

Unknown 

ATPase 

Homeobox-leucffw zipper 1 
Light handling complex 
Unknown 

Homeobox -leucine zipper 2 
Homeobojt-teucine zipper 4 
PhosphortoutoKinase 
Homeobox-leucine zpper 5 
Urdcnown 

Homeobox-leucirw zipper 22 
Oxygen evolving 
Uhknown 

Knotted'Mke homeobox 1 
RuBisCO smaD sutxjnit 
Translalion elongation factor 
Protein phosphatase 1 
Unknown 

CMoroplasl protease 

Unknown 

Cyctophflin 

GTP binding 

Unknown 

Unknown 

Uiknown 

Unknown 

Synaptobrevin 

IJghl harvesting complex 

Ughl harvestirkg complex 

Yeast tryptophan bkxsynlhesis 



H36236 
227010 
M20016 
U36594t 
T45783 
M85150 
T44490 
I_37126 
U35595t 
X63894 
X52256 . 
T04477 
. X63e95 
R87034 
T14152 
T22720 
Ml 4053 
U35596t 
J04185 
U09332 
704063 
t76267 
U09335 
M90394 
T04344 
M90416 
233675 
U09336 
T21749 
234607 
U14174 
XI 4564 
T42799 
U34803 
T44621 
T43698 
R65481 
114844 
X59152 
233795 
T45278 
T13832 
R64816 
lVI904ie 
218205 
X03909 
X04273 



ohstrvcd for HAT4, large changes in ex- 
pression were not observed for any of the 
other 44 genes we examined. TKia was 
somewhat surprising^ particularly because 
comparative analysis c( leaf and root tissue 
identified 27 diflfcrcntially expressed genes. 
Analysis of an expanded set of genes may be 
required to identify genes whose expression 
changes upon HAT4 overexprcssion; alter- 
natively, a comparison of mRNA popula- 
tions from specific tissues of wild-type and 
HAT4-transgenic plants may allow identi- 
^cation of downstream genes. 

At the current density of robotic printing, 
it is feasible to scale up die fabrication pto- 
cess to produce aixays containir^g 20,000 
cDN A targets. At diU density, a single array 
would be sufficient to provide gene-specific 
targets encompassing nearly die entire rep- 
crtoire of expressed genes in the Arabidopsii 
genome (2). The availability of 20,274 ESTs 
from Arahidopsis (1,9) would provide a rich 
source of templates for such studies. 

The estimated 100^ genes in the hu- 
man genome (10) exceeds the number of 
Arahidopsis genes by a factor of 5 (2). This 
modest increase in complexity suggests that 
similar cDNA microarrays. prepared from 
the rapidly growing repertoire of human 
ESTs (/). could be used to determine the 
expression patterns of ter« of thousands of 
human genes in diverse cell types. Coupling 
an amplificarion strategy to the revene 
transcription reaction (Jl) could make it 
feasible to monitor expression even in 
minute tissue samples. A wide variety of 
acute and chronic physiological and patho- 
logical conditions might lead to character- 
istic changes in the patterns of gene expres- 
sion in peripheral blood cells or other easily 
sampled tissues. In concen with cDNA mi- 
, croarrays for monitoring complex expres- 
sion patterns, these tissues might therefore 
serve as sensitive in vivo sensors for clinical 
diagnosis. M icroanays of cDNAs could thus 
provide a useful link between huthan gene 
sequences and clinical medicine. 

Table 2. Gene expression monitoring by microar- 
ray and RNA blot analyses; tg, HAT^-transgerwc 
See Table 1 for additional gene intormatioa Ex- 
pression levels (w/w) were calibrated with the use 
of knovvn anxjurts of hurrian AChR rnRNA. Values 
for the microarray were detemmed from microar- 
ray scans (Fig. 1); values tor the RNA blot were 
determined from RMA blots (Rg. 2). 



Gene 



Expression level (wAw) 



Microarmy 



RNA blot 



•Proprietary seguTO o( Stratageno (U JoBa. CaiHomiaJ, tNo match n the database: no/eTEsT 



CABl 
CAB/Og) 
HAr4 
HAT4 ag) 
ROC1 
ROC1 (tg) 



1:48 

1:120 

1:8300 

1:150 

1:1200 

1:260 



1:83 
1:150 
1:6300 
1:210 

i:iaoo 

1:1300 
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Gene Therapy in Peripheral Blood 
Lymphocytes and Bone Marrow for 
ADA" Immunodeficient Patients 

Claudio Bordignon,* Luigi D. Notarangelo, Nadia Nobili, 
Giuliana Ferrari. Giulia Casorati, Paola Panina, Evelina Mazzolari. 
Daniela Maggioni. Claudia Rossi, Paolo Servida, 
Alberto G. Ugazio, Fulvio Mavilio 

Adenosine deaminase (ADA) deficiency results in severe combined Immunodeficiency, 
the first genetic disorder treated by gene therapy. Two different retroviral vectors were 
used to transfer ex vivo the human ADA minlgene into bone inarrow cells and peripheral 
blood lymphocytes from two patients undergoing exogenous enzyme replacement ther- 
apy. After 2 years of treatment, long-term survival of T and B lymphocytes. mant)w cells, 
and granulocytes expressing the transfen-ed ADA gene was demonstrated and resulted 
in normalization of the immune repertoire and restoration of cellular and humoral immunity. 
After discontinuation of treatment, T lymphocytes, derived from transduced peripheral 
blood lymphocytes, were progressively replaced by nwrow-derived T cells in twth pa- 
tients. These results indicate successful gene transfer into long-lasting progenitor cells, 
producing a functional multilineage progeny. * 



Severe combined immunodeficiency asso- 
ciated widi inherited deficiency of ADA 

(1) is usually fatal unless affected children 
arc kept in procective isolation or the im- 
mune system is reconstituted by bone mar- 
row transplantation from a human leuko- 
cyte antigen (HLAMdentical sibling donor 

(2) . This is the therapy of choice, although 
it is available only for a minority of patients. 
In recent years, other forms of tihcrapy have 
been developed, including traruplants from 
haploidentical donors (3, exogenous en- 
zyme replacement (5). and somatic-cell 
gene dierapy (6-9). 

We previously reponed a preclinical mod- 
el in which ADA gene transfer and expression 

C. Bordignon. N. NobB. G. Ferrart. D. Maggbni. C. Rossi. 
P. SavkJa, F. Mavilo, Telethon Gene Therapy Program 
tor Genetic Diseases. DIBTT. btituto Sdenlifico H. S. Ral- 
faele, Mian. na)y. 

L D. Notarangelo. E. Mazzolari, A. G. Ugazio, Depart- 
ment of Pedialrics. University of Bresds KAedic^ School. 
Brescia. Italy. 

G. Casoratl, Unitd tmrriunoc^^rraca. DfBfT. Islituto 56- 

entHico H. S. Raffaete. Mian. Italy. 

P. Panina. Roche Mitano Rtcerche. Mtt^. tiaty. 

• To whom conespondence should be addressed. 



successfully restored immune fuxKttons in hu- 
man ADA-def icicnt (ADA ~ ) peripheral 
blood lymphocytes (PBLs) in immunodefi- 
cient mice in vivo (iO, ] I J. On the basis of 
these preclinical results, the clinical applica- 
tion of gene therapy for the treatment of 
ADA" SCID (severe combined immumdeTi* 
ciency disease) patients who previously foiled 
exogenous cniyme repUcemcnt therapy was 
approved by our Institutional Ethical Com- 
mittees arwl by the Italian National Conunit- 
tee for Bioethics (12). In addition to evaluat- 
ing the safety arul efficacy of d\c gene therapy 
procedure, the aim of the study was to define 
. the relative role of PBli and hematopoietic 
stem cells in the lor\g-term reconstttudoa of 
immune functions after retroviral veaor-me- 
dialed ADA gene transfer. For thb purpose, 
two structurally identical vectors expressing 
the hutruin ADA complementary DNA 
(cDNA), distinguishable by the presence of 
alternative restriction sites in a nonfunctional 
region of the viral long-terminal repeat 
(LTR). were used to traruduce PBLs and bone 
marrow (BM) cells independently. This pro- 
cedure allowed identification of die origin of 
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Pleld of the Invention 

5 This invention relates to a method and apparatus 

for fabricating microarrays of biological Seuoples for 
large scale screening assays, such as arrays of DNA 
samples to be used in DNA hybridization assays for 
genetic research and diagnostic applications. 

10 
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BaclcQround of the Invention 

A variety of methods are currently available for 
making arrays of biological macromolecules, such as 

10 arrays of nucleic acid molecules or proteins. One 
method for msOcing ordered arrays of DNA on a porous 
membrane is a "dot blot" approach. In this method, a 
vacuxua manifold transfers a plurality, e.g., 96, 
aqueous samples of DNA from 3 millimeter diameter wells 

15 to a porous membrane* A common variant of this 

procedure is a "slot-blot" method in which the wells 
have highly-elongated oval shapes. 

The DNA is immobilized on the porous membrane by 
baking the membrane or exposing it to UV radiation. 

20 This is a manual procedure practical for making one 

array at a time and usually limited to 96 samples per 
array. "Dot-blot" procedures are therefore inadequate 
for applications in which many thousand samples must be 
determined. 

25 A more efficient technique employed for making 

ordered arrays of genomic fragments uses an array of 
pins dipped into the wells, e.g., the 96 wells of a 
microtitre plate, for transferring an array of samples 
to a substrate, such as a porous membrane. One array 

30 includes pins that are designed to spot a meiabrane in a 
staggered fashion, for creating an array of 9216 spots 
in a 22 X 22 cm area (Lehrach, et al., 1990). A 
limitation with this approach is that the volume of DNA 
spotted in each pixel of each array is highly variable. 
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In addition r the number of arrays that can be made with 
each dipping is usually quite small. 

An alternate method of creating ordered arrays of 
nucleic acid sequences is described by Pirrung, et al . 
5 (1992), and also by Fodor, et al- (1991). The method 
involves synthesizing different nucleic acid sequences 
at different discrete regions of a support. This 
method employs elaborate synthetic schemes, and is 
generally limited to relatively short nucleic acid 

10 sample, e.g., less than 20 bases. A related method has 
been described by Southern, et al. (1992). 

Khrapko, et al. (1991) describes a method of 
making an oligonucleotide matrix by spotting DNA onto a 
thin layer of poiyacrylamide. The spotting is done 

15 manually with a micropipette. 

None of the methods or devices described in the 
prior art are. designed for mass fabrication of 
microarrays chsoracterized by (i) a large number of 
micro-sized assay regions separated by a distance of 

20 50-200 microns or less, and (ii) a well-defined amount, 
typically in the picomole range, of analyte associated 
with each region of the array. 

Furthermore, current technology is directed at 
performing such assays one at a time to a single array 

25 of DNA molecules. For example, the most common method 
for performing DNA hybridizations to arrays spotted 
onto porous membrane involves sealing the membrane in a 
plastic bag (Maniatas, et al., 1989) or a rotating 
glass cylinder (Robbins Scientific) with the labeled 

30 hybridization probe inside the sealed chamber. For 
arrays made on non-porous surfaces, such as a 
microscope slide, each array is incubated with the 
labeled hybridization probe sealed under a coverslip. 
These techniques rec[uire a separate sealed chamber for 
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each array which makes the screening and handling of 
many such arrays inconvenient and time intensive. 

Abouzied, et ai. (1994) describes a method of 
printing horizontal lines of antibodies on a 
5 nitrocellulose membrane and separating regions of the 
membrane with vertical stripes of a hydrophobic 
material. Each vertical stripe is then reacted with a 
different antigen and the reaction between the 
immobilized antibody and an antigen is detected using a 

10 standard ELISA colorimetric technique. Abouzied's 
technique makes it possible to screen many one- 
dimensional arrays simultaneously on a single sheet of 
nitrocellulose. Abouzied medces the nitrocellulose 
somewhat hydrophobic using a line drawn with PAP Pen 

15 (Research Products International) . However 7U3ouzied 
does not describe a technology that is capable of 
completely sealing the pores of the nitrocellulose. The 
pores of the nitrocellulose are still physically open 
and so the assay reagents can leak through the 

20 hydrophobic barrier during extended high temperature 
incubations or in the presence of detergents which 
makes the Abouzied technique unacceptable for DNA 
hybridization assays. 

Porous membranes with printed patterns of 

25 hydrophilic/hydrophobic regions exist for applications 
such as ordered arrays of bacteria colonies. QA Life 
Sciences (San Diego CA) makes such a membrane with a 
grid pattern printed on it. However, this meiobrane has 
the same disadvantage as the Abouzied technique since 

30 reagents can still flow between the gridded arrays 
making them unusable for separate DNA hybridization 
assays . 

Pall Corporation make a 96-well plate with a 
porous filter heat sealed to the bottom of the plate. 
35 These plates are capable of containing different 
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reagents in each well without cross-contamination* 
However, each well is intended to hold only one target 
element whereas the invention described here makes a 
microarray of many biomolecules in each subdivided 
5 region of the solid support. Furthermore, the 96 well 
plates are at least 1 cm thick and prevent the use of 
the device for many colorimetric, fluorescent and 
radioactive detection formats which require that the 
membrane lie flat against the detection surface. The 

10 invention described here requires no further processing 
after the assay step since the barriers elements are 
shallow and do not interfere with the detection step 
thereby greatly increasing convenience. 

Hyseq Corporation has described a method of making 

15 an "array of arrays" on a non-porous solid support for 
use with their sequencing by hybridization technique. 
The method described by Hyseq involves modifying the 
chemistry of the solid support material to form a 
hydrophobic grid pattern where each subdivided region 

20 contains a microarray of biomolecules. Hyseq 's flat 
hydrophobic pattern does not make use of physical 
blocking as an additional means of preventing cross 
contamination. 

25 ftiiwiTHM-t ^ of the Invention 

The invention includes, in one aspect, a method of 
forming a microarray of analyte-assay regions on a 
solid support, where each region in the array has a 
known amotint of a selected, analyte-specif ic reagent. 

30 The method involves first loading a solution of a 
selected analyte-specif ic reagent in a reagent- 
dispensing device having an elongate capillary channel 
(i) formed by spaced-apart, coextensive elongate 
members, (ii) adapted to hold a quantity of the reagent 

35 solution and (iii) having a tip r gion at which aqueous 
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solution in the channel forms a meniscus. The channel 
is preferably formed by a pair of spaced-apart tapered 
elements. 

The tip of the dispensing device is tapped against 
5 a solid support at a defined position on the support 

surface with an impulse effective to break the meniscus 
in the capillary channel deposit a selected volxime of 
solution on the surface, preferably a selected volume 
in the range 0.01 to 100 nl. The two steps are 

10 repeated until the desired eurray is formed • 

The method may be practiced in forming a plurality 
of such arrays, where the solution-depositing step is 
are applied to a selected position on each of a 
plurality of solid supports at each repeat cycle. 

15 The dispensing device may be loaded with a new 

solution, by the steps of (i) dipping the capillaa^f 
channel of the device in a wash solution, (ii) removing 
wash solution drawn into the capillary channel, and 
(iii) dipping the capilleury channel into the new 

20 reagent solution. 

Also included in the invention is an automated 
apparatus for forming a microarray of analyte-assay 
regions on a plurality of solid supports, where each 
region in the array has a known amount of a selected, 

25 emalyte-specific reagent. The appeiratus has a holder 
for holding, at known positions, a plurality of planar, 
supports, and a reagent dispensing device of the type 
described above. 

The apparatus further includes positioning 

30 structure for positioning the dispensing device at a 
selected array position with respect to a support in 
said holder, and dispensing structure for moving the 
dispensing device into tapping engagement against a 
support with a selected impulse effective to deposit a 
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selected volume on the support, e.g., a selected volume 
in the volume range 0.01 to 100 nl. 

The positioning and dispensing structures are 
controlled by a control unit in the apparatus. The 
5 unit operates to (i) place the dispensing device at a 
loading station, (ii) move the capillary channel in the 
device into a selected reagent at the loading station, 
to load the dispensing device with the reagent, and 
(iii) dispense the reagent at a defined array position 

10 on each of the supports on said holder. The unit may 
fxxrther operate, at the end of a dispensing cycle, to 
wash the dispensing device by (i) placing the 
dispensing device at a washing station, (ii) moving the 
capillary channel in the device into a wash fluid, to 

15 load the dispensing device with the fluid, and (iii) 
remove the wash fluid prior to loading the dispensing 
device with a fresh selected reagent. 

The dispensing device in the apparatus may be one 
of a plxirality of such devices which are carried on the 

20 arm for dispensing different analyte assay reagents at 
selected spaced array positions. 

In another aspect, the invention includes a 
substrate with a surface having a microarray of at 
least 10^ distinct polynucleotide or polypeptide 

25 biopolymers in a surface area of less than about 1 cm^. 
Each distinct biopolymer (i) is disposed at a separate, 
defined position in said array, (ii) has a length of at 
least 50 subunits, and (iii) is present in a defined 
amount between about 0.1 femtomoles and 100 nanomoles. 

30 In one embodiment, the surface is glass slide 

surface coated with a polycationic polymer, such as 
poly lysine, and the biopolymers are polynucleotides. 
In another embodiment, the substrate has a water- 
impermeable backing, a water-permeable film formed on 
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the backing, and a grid formed on the film. The grid 
is composed of intersecting water-impervious grid 
elements extending from said backing to positions 
raised above the surface of said film, and partitions 
5 the film into a plurality of water-impervious cells. A 
biopolymer array is formed within each well. 

More generally, there is provided a substrate for 
use in detecting binding of labeled polynucleotides to 
one or more of a plurality different-sequence, 

10 immobilized polynucleotides. The substrate includes, 
in one aspect, a glass support, a coating of a 
polycationic polymer, such as poly lysine, on said 
surface of the support, and an array of distinct 
polynucleotides electrostatically bound rion-covalently 

15 to said coating, where each distinct biopolymer is 

disposed at a separate, defined position in a siirface 
array of polynucleotides. 

In another aspect, the substrate includes a water- 
impermeable backing, a water-permeable film formed on 

20 the backing, and a grid formed on the film, where the 
grid is composed of intersecting water- impervious grid 
elements extending from the backing to positions raised 
above the surface of the film, forming a plurality of 
cells. A biopolymer array is formed within each cell. 

25 Also forming part of the invention is a method of 

detecting differential expression of each of a 
plurality of genes in a first cell type, with respect 
to expression of the same genes in a second cell type. 
In practicing the method, there is first produced 

30 fluorescent-labeled cDNA's from mRNA's isolated from 
the two cells types, where the cDNA'S from the first 
and second cells are labeled with first and second 
different fluorescent reporters. 

A mixture of the labeled cDNA's from the two cell 

35 types is added to an array of polynucleotides 
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representing a plurality of known genes derived from 
the two cell types, under conditions that result in 
hybridization of the cDNA's to complementary-sequence 
polynucleotides in the array. The array is then 
5 examined by fluorescence under fluorescence excitation 
conditions in which (i) polynucleotides in the array 
that are hybridized predominantly to cDNA's derived 
from one of the first and second cell types give a 
distinct first or second fluorescence emission color, 

10 respectively, and (ii) polynucleotides in the array 

that are hybridized to substantially equal numbers of 
cDNA's derived from the first and second cell types 
give a distinct combined fluorescence emission color, 
respectively. The relative expression of known genes 

15 in the two cell types can then be determined by the 
observed fluorescence emission color of each spot. 

These and other objects and features of the 
invention will become more fully apparent when the 
following detailed description of the invention is read 

20 in conjunction with the accompanying figvires. 

Brief Deseriptien of the Dravinas 

Fig. 1 is a side view of a reagent-dispensing 
device having a open-capillary dispensing head 
25 constructed for use in one embodiment of the invention; 

Figs. 2A-2C illustrate steps in the delivery of a 
fixed-volume bead on a hydrophobic surface employing 
the dispensing head from Fig. 1, in accordance with one 
embodiment of the method of the invention; 
30 Fig. 3 shows a portion of a two-dimensional array 

of analyte-assay regions constructed according to the 
method of the invention; 

Fig. 4 is a planar view showing components of an 
automated apparatus for forming arrays in accordance 
35 with the invention. 
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Fig. 5 shows a fluorescent image of an actual 20 x 
20 array of 400 f luorescently-labeled DNA samples 
immobilized on a poly-l-lysine coated slide, where the 
total area covered by the 400 element array is 16 
5 square millimeters; 

Fig. 6 is a fluorescent image of a 1.8 cm x i,8 cm 
microarray containing lambda clones with yeast inserts, 
the fluorescent signal arising from the hybridization 
to the array with approximately half the yeast genome 

10 labeled with a green fluorophore and the other half 
with a red fluorophore; 

Fig. 7 shows the translation of the hybridization 
image of Fig. 6 into a karyotype of the yeast genome / 
where the elements of Fig. -6 microarray contain yeast 

15 DNA sequences that have been previously physically 
mapped in the yeast genome; 

Fig. 8 show a fluorescent image of a, 0,5 cm x 0.5 
cm microarray of 24 cDNA clones, where the microarray 
was hybridized simultaneously with total cDNA from wild 

20 type Arabidopsis plant labeled with a green fluorophore 
and total cDNA from a transgenic Arabidopsis plant 
labeled with a red fluorophore, and the arrow points to 
the cDNA clone representing the gene introduced into 
the transgenic Arabidopsis plant; 

25 Fig. 9 shows a plan view of substrate having an 

array of cells formed by barrier elements in the form 
of a grid; 

Fig. 10 shows an enlarged plan view of one of the 
cells in the substrate in Fig. 9, showing an array of 
30 polynucleotide regions in the cell; 

Fig. 11 is an enlarged sectional view of the 
substrate in Fig* 9, taken along a section line in that 
figure; and 

Fig. 12 is a scanned image of a 3 cm x 3 cm 
35 nitrocellulose solid support containing four identical 
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arrays of M13 clones in each of four quadrants, where 
each quadrant was hybridized simultaneously to a 
different oligonucleotide using an open face 
hybridization method. 

5 

Detailed Daacription of th e Invention 

I- pefinj,tions 

Unless indicated otherwise, the terms defined 
below have the following meanings: 

10 "Ligand" refers to one member of a ligand/anti- 

ligand binding pair. The ligand may be, for example, 
one of the nucleic acid strands in a complementary, 
hybridized nucleic acid duplex binding pair; an 
effector molecule in an effector /receptor binding pair; 

15 or an antigen in an antigen/ antibody or 
antigen/ antibody fragment binding pair. 

" Ant i ligand" refers to the opposite member of a 
ligand/ ant i-ligand binding pair. The antiligand may be 
the other of the nucleic acid strands in a 

20 complementary, hybridized nucleic acid duplex binding 
pair; the receptor molecule in an effector/receptor 
binding pair; or an antibody or antibody fragment 
molecule in antigen/ antibody or antigen/ antibody 
fragment binding pair, respectively. 

25 "Analyte" or "analyte molecule" refers to a 

molecule, typically a macromolecule, such as a 
polynucleotide or polypeptide, whose presence, amount, 
and/ or identity are to be determined. The analyte is 
one member of a ligand/anti-ligand pair. 

30 "Analyte-specif ic assay reagent" refers to a 

molecule effective to bind specifically to an analyte 
molecule. The reagent is the opposite member of a 
ligand/anti-ligand binding pair. 

An "array of regions on a solid support" is a 

35 linear or two-dimensional array of preferably discrete 
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regions, each having a finite area, formed on the 
surface of a solid support. 

A "microarray" is an array of regions having a 
density of discrete regions of at least about lOO/cm^, 
5 and preferably at least about 1000/cm^. The regions in 
a microarray have typical dimensions, e.g., diameters, 
in the range of between about 10-250 ^m, and are 
separated from other regions in the array by about the 
same distance. 

10 A support surface is "hydrophobic" if a aqueous- 

medium droplet applied to the surface does not spread 
out substantially beyond the area size of the applied 
droplet. That is, the surface acts to prevent 
spreading of the droplet applied to the surface by 

15 hydrophobic interaction with the droplet. 

A "meniscus" means a concave or convex surface 
that forms on the bottom of a liquid in a channel as a 
result of the surface tension of the liquid. 

"Distinct biopolymers", as applied to the 

20 biopolymers forming a microarray, means an array member 
which is distinct from other array members on the basis 
of a different biopolymer sequence, and/or different 
concentrations of the same or distinct biopolymers, 
and/or different mixtures of distinct or dif f erent- 

25 concentration biopolymers. Thus an array of "distinct 
.polynucleotides" means an array containing, as its 
members, (i) distinct polynucleotides, which may have a 
defined amount in each member, (ii) different, graded 
concentrations of given-sequence polynucleotides, 

30 and/or (iii) different-composition mixtures of two or 
more distinct polynucleotides. 

"Cell type" means a cell from a given source, 
e.g., a tissue, or organ, or a cell in a given state of 
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differentiation, or a cell associated with a given 
pathology or genetic makeup. 

II. Method of Microarrav Formation 
5 This section describes a method of forming a 

microarray of analyte-assay regions on a solid support 
or substrate, where each region in the array has a 
known amount of a selected, analyte-specif ic reagent. 
. Fig. 1 illustrates, in a partially schematic view, 

10 a reagent-dispensing device 10 useful in practicing the 
method. The device generally includes a reagent 
dispenser 12 having an elongate open capillary channel 
14 adapted to hold a quantity of the reagent solution, 
such as indicated at 16, as will be described below. 

15 The capillary channel is formed by a pair of spaced- 

apart, coextensive, elongate members 12a, 12b which are 
tapered toward one another and converge at a tip or tip 
region 18 at the lower end of the channel. More 
generally, the open channel is formed by at least two 

20 elongate, spaced-apart members adapted to hold a 

quantity of reagent solutions and having a tip region 
at which aqueous solution in the channel forms a 
meniscus, such as the concave meniscus illustrated at 
20 in Fig. 2A. The advantages of the open channel 

25 construction of the dispenser are discussed below. 

With continued reference to Fig. 1^ the dispenser 
device also includes structure for moving the dispenser 
rapidly toward and away from a support surface, for 
effecting deposition of a known amount of solution in 

30 the dispenser on a support, as will be described below 
with reference to Figs. 2A-2C. In the embodiment 
shown, this structure includes a solenoid 22 which is 
activatable to draw a solenoid piston 24 rapidly 
downwardly, then release the piston, e.g., under spring 

35 bias, to a normal, raised position, as shown. The 
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dispenser is carried on the piston by a connecting 
member 26, as shown. The just-described moving 
structure is also referred to herein as dispensing 
means for moving the dispenser into engagement with a 
5 solid support, for dispensing a known volume of fluid 
on the support. 

The dispensing device just described is carried on 
an arm 28 that may be moved either linearly or in an x- 
y plane to position the dispenser at a selected 

10 deposition position, as will be described. 

Figs. 2A-2C illustrate the method of depositing a 
known amount of reagent solution in the just-described 
dispenser on the surface of a solid support, such as 
the support indicated at 30. The support is a polymer, 

15 glass, or other solid-material support having a sxirface 
indicated at 31. 

In one general embodiment, the surface is a 
relatively hydrophilic. I.e., wettable surface, such as 
a surface having native, bovmd or covalently attached 

20 charged groups. On such siirface described below is a 
glass surface having an absorbed layer of a 
polycationic polymer, such as poly-l-lysine. 

In another embodiment, the surface has or is 
formed to have a relatively hydrophobic character, 

25 i.e., one that causes aqueous medium deposited on the 
surface to bead. A variety of known hydrophobic 
polymers, such as polystyrene, polypropylene, or 
polyethylene have desired hydrophobic properties, as do 
glass and a variety of lubricant or other hydrophobic 

30 films that may be applied to the support surface. 

Initially, the dispenser is loaded with a selected 
analyte-specif ic reagent solution, such as by dipping 
the dispenser tip, after washing, into a solution of 
the reagent, and allowing filling by capillary flow 

35 into the dispenser channel. The dispenser is now moved 
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to a selected position with respect to a support 
surface, placing the dispenser tip directly above the 
support-surface position at which the reagent is to be 
deposited. This movement takes place with the 
5 dispenser tip in its raised position, as seen in Fig. 
2A, where the tip is typically at least several 1-5 mm 
above the surface of the substrate. 

With the dispenser so positioned, solenoid 22 is 
now activated to cause the dispenser tip to move 

10 rapidly toward and away from the substrate surface, 
making momentary contact with the surface, in effect, 
tapping the tip of the dispenser against the support 
surface- The tapping movement of the tip against the 
surface acts to break the liquid meniscus in the tip 

15 channel, bringing the liquid in the tip into contact 
with the support surface. This, in turn, produces a 
flowing of the liquid into the capillary space between 
the tip and the surface, acting to draw liquid out of 
the dispenser channel, as seen in Fig. 2B. 

20 Fig. 2C shows flow of fluid from the tip onto the 

support surface, which in this case is a hydrophobic 
surface. The figure illustrates that liquid continues 
to flow from the dispenser onto the support surface 
until it forms a liquid bead 32. At a given bead size, 

25 i.e., volume, the tendency of liqtuid to flow onto the 
surface will be balanced by the hydrophobic surface 
interaction of the bead with the support surface, which 
acts to limit the total bead area on the surface, and 
by the surface tension of the droplet, which tends 

30 toward a given bead curvature. At this point, a given 
bead volume will have formed, and continued contact of 
the dispenser tip with the bead, as the dispenser tip 
is being withdrawn, will have little or no effect on 
bead volume. 
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For liquid-dispensing on a more hydrophilic 
surface, the liquid will have less of a tendency to 
bead, and the dispensed volume will be more sensitive 
to the total dwell time of the dispenser tip in the 
5 immediate vicinity of the support surface, e.gr., the 
positions illustrated in Figs, 2B and 2C. 

The desired deposition volume, i.e., bead volume, 
formed by this method is preferably in the range 2 pi 
(picoliters) to 2 nl (nanoliters) , although volumes as 

10 high as 100 nl or more may be dispensed. It will be 
appreciated that the selected dispensed voliame will 
depend on (i) the "footprint" of the dispenser tip, 
i.e., the size of the area spanned by the tip, (ii) the 
hydrophobicity of the support surface, and (iii) the 

15 time of contact with and rate of withdrawal of the tip 
from the support surface. In addition, bead size may 
be reduced by increasing the viscosity of the medium, 
effectively reducing the flow time of liquid from the . 
dispenser onto the support surface. The drop size may 

20 be further constrained by depositing the drop in a 
hydrophilic region surrounded by a hydrophobic grid 
pattern on the support svirface. 

In a typical embodiment, the dispenser tip is 
tapped rapidly against the support surface, with a 

25 total residence time in contact with the support of 
less than about 1 msec, and a rate of upward travel 
from the surface of about 10 cm/sec - 

Asstiming that the bead that forms on contact with 
the surface is a hemispherical bead, with a diameter 

30 approximately equal to the width of the dispenser tip, 
as shown in Fig. 2C, the volume of the bead formed in 
relation to dispenser tip width (d) is given in Table 1 
below. As seen, the volume of the bead ranges between 
2 pi to 2 nl as the width size is increased from about 

35 20 to 200 Mm* 
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Table 1 



d 


Volume (nl) 


20 fm 


2 X 10-^ 


50 fMJEL 


3.1 X 10'^ 


100 


2.5 X 10'' 


200 /xm 


2 



10 At a given tip size, bead voliune can be reduced in 

a controlled fashion by increasing surface 
hydrophobicity, reducing time of contact of the tip 
with the surface, increasing rate of movement of the 
tip away from the surface, and/or increasing the 

15 viscosity of the medivun* Once these parameters are 

fixed, a selected deposition volume. in the desired pi 
to nl range can be achieved in a repeatable fashion. 

After depositing a bead at one selected location 
on a support, the tip is typically moved to a 

20 corresponding position on a second support, a droplet 
is deposited at that position, and this process is 
repeated until a liquid droplet of the reagent has been 
deposited at a selected position on each of a plurality 
of supports. 

25 The tip is then washed to remove the reagent 

liquid, filled with another reagent liquid and this 
reagent is now deposited at each another array position 
on each of the supports. In one embodiment, the tip is 
washed and refilled by the steps of (i) dipping the 

30 capillary channel of the device in a wash solution, 
(ii) removing wash solution drawn into the capillary 
channel, and (iii) dipping the capillary channel into 
the new reagent solution. 

From the foregoing, it will be appreciated that 

35 the tweezers-like, open-capillary dispenser tip 
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provides the advantages that (i) the open channel of 
the tip facilitates rapid, efficient washing and drying 
before reloading the tip with a new reagent, (ii) 
passive capillary action can load the sample directly 
5 from a standard microwell plate while retaining 

sufficient sample in the open capillary reservoir for 
the printing of numerous arrays, (iii) open capillaries 
are less prone to clogging than closed capillaries, and 
(iv) open capillaries do not require a perfectly faced 

10 bottom surface for fluid delivery. 

A portion of a microarray 36 formed on the surface 
38 of a solid support 40 in accordance with the method 
just described is shown in Fig. 3. The array is formed 
of a plurality of analyte-specif ic reagent regions, 

15 such as regions 42, where each region may include a 
different analyte-specif ic reagent. As indicated 
above, the diameter of each region is preferably 
between about 20-200 /la. The spacing between each 
region and its closest (non-diagonal) neighbor, 

20 measured from center-to-center (indicated at 44) , is 

preferably in the range of about 20-400 /m. Thus, for 
exeuaple, an array having a center-to-center spacing of 
about 250 tm contains about 40 regions/cm or 1,600 
regions/cm^. After formation of the array, the support 

25 is treated to evaporate the liquid of the droplet 

forming each region, to leave a desired array of dried, 
relatively flat regions. This drying may be done by 
heating or under vacuum. 

In some cases, it is desired to first rehydrate 

30 the droplets containing the analyte reagents to allow 
for more time for adsorption to the solid support. It 
is also possible to spot out the analyte reagents in a 
humid environment so that droplets do not dry until the 
arraying operation is complete. 
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III. Automated Apparatus for Forming Arrays 

In another aspect, the invention includes an 
automated apparatus for forming an array of analyte- 
assay regions on a solid support, where each region in 
5 the array has a known amount of a selected, analyte- 
specific reagent. 

The apparatus is shown in planar, and partially 
schematic view in Fig. 4. A dispenser device 72 in the 
apparatus has the basic construction described above 

10 with respect to Fig. 1, and includes a dispenser 74 

having an open-capillary channel terminating at a tip, 
substantially as shown in Figs. 1 and 2A-2C. 

The dispenser is mounted in the device for 
movement toward and away from a dispensing position at 

15 which the tip of the dispenser taps a support surface, 
to dispense a selected volume of reagent solution, as 
described above. This movement is effected by a 
solenoid 76 as described above. Solenoid 76 is under 
the control of a control unit 77 whose operation will 

20 be described below. The solenoid is also referred to 
herein as dispensing means for moving the device into 
tapping engagement with a support, when the device is 
positioned at a defined array position with respect to 
that support. 

25 The dispenser device is carried on an arm 74 which 

±s threadedly mounted on a worm screw 80 driven 
(rotated) in a desired direction by a stepper motor 82 
also under the control of unit 77. At its left end in 
the figure screw 80 is carried in a sleeve 84 for 

30 rotation about the screw axis. At its other end, the 
screw is mounted to the drive shaft of the stepper 
motor, which in turn is carried on a sleeve 86. The 
dispenser device, worm screw, the two sleeves mounting 
the worm screw, and the stepper motor used in moving 

35 the device in the "x" (horizontal) direction in the 
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figure form what is referred to here collectively as a 
displacement assembly 86. 

The displacement assembly is constructed to 
produce precise, micro-range movement in the direction 
5 of the screw, i.e., along an x axis in the figure. In 
one mode, the assembly functions to move the dispenser 
in X-axis increments having a selected distance in the 
range 5-25 fm. In another mode, the dispenser unit may 
be moved in precise x-axis increments of several 

10 microns or more,; for positioning the dispenser at 

associated positions on adjacent supports, as will be 
described below. 

The displacement assembly, in turn, is mounted for 
movement in the "y" (vertical) axis of the figure, for 

15 positioning the dispenser at a selected y axis 

position. The structxire mounting the assembly includes 
a fixed rod 88 mounted rigidly between a pair of frame 
bars 90, 92, and a worm screw 94 mounted for rotation 
between a pair of frame bars 96, 98. The worm screw is 

20 driven (rotated) by a stepper motor 100 which operates 
under the control of unit 77. The motor is mounted on 
bar 96, as shown. 

The structure just described, including worm screw 
94 and motor 100, is constructed to produce precise, 

25 micro-range movement in the direction of the screw, 
i.e., along an y axis in the figure. As above, the 
structure functions in one mode to move the dispenser 
in y-axis increments having a selected distance in the 
range 5-250 fim, and in a second mode, to move the 

30 dispenser in precise y-axis increments of several 

microns (^m) or more, for positioning the dispenser at 
associated positions on adjacent supports. 

The displacement assembly and structure for moving 
this assembly in the y axis are referred to herein 

35 collectively as positioning means for positioning the 
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dispensing device at a selected array position with 
respect to a support. 

A holder 102 in the apparatus functions to hold a 
plurality of supports, such as supports 104 on which 
5 the microarrays of regent regions are to be formed by 
the apparatus. The holder provides a number of 
recessed slots, such as slot 106, which receive the 
supports, and position them at precise selected 
positions with respect to the frame bars on which the 

10 dispenser moving means is mounted. 

As noted above, the control unit in the device 
functions to actuate the two stepper motors and 
dispenser solenoid in a sequence designed for automated 
operation of the apparatus in forming a selected 

15 microarray of reagent regions on each of a plxxrality of 
supports . 

The control unit is constructed, according to 
conventional microprocessor control principles/ to 
provide appropriate signals to each of the solenoid and 

20 each of the stepper motors, in a given timed sequence 
and for appropriate signalling time. The construction 
of the unit, and the settings that are selected by the 
user to achieve a desired array pattern, will be 
xanderstood from the following description of a typical 

25 apparatus operation. 

Initially, one or more supports are placed in one 
or more slots in the holder. The dispenser is then 
moved to a position directly above a well (not shown) 
containing a solution of the first reagent to be 

30 dispensed on the support (s) . The dispenser solenoid is 
actuated now to lower the dispenser tip into this well, 
causing the capillary channel in the dispenser to fill. 
Motors 82, 100 are now actuated to position the 
dispenser at a selected array position at the first of 

35 the supports. Solenoid actuation of the dispenser is 
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then effective to dispense a selected-volume droplet of 
that reagent at this location • As noted above, this 
operation is effective to dispense a selected volume 
preferably between 2 pi and 2 nl of the reagent 
5 solution. 

The dispenser is now moved to the corresponding 
position at an adjacent support and a similar volume of 
the solution is dispensed at this position. The 
process is repeated until the reagent has been 

10 dispensed at this preselected corresponding position on 
each of the supports. 

Where it is desired to dispense a single reagent 
at more than two array positions on a support, the 
dispenser may be moved to different array positions at 

15 each support, before moving the dispenser to a new 
support, or solution can be dispensed at individual 
positions on each support, at one selected position, 
then the cycle repeated for each new array position. 
To dispense the next reagent, the dispenser is 

20 positioned over a wash solution (not shown) , and the 
dispenser tip is dipped in and out of this solution 
until the reagent solution has been substantially 
washed from the tip. Solution can be removed from the 
tip, after each dipping, by vacuum, compressed air 

25 spray, sponge, or the like. 

The dispenser tip is now dipped in a second 
reagent well, and the filled tip is moved to a second 
selected array position in the first support. The 
process of dispensing reagent at each of the 

30 corresponding second-array positions is then carried as 
above. This process is repeated until an entire 
microarray of reagent solutions on each of the supports 
has been formed. 



35 
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This section describes embodiments of a substrate 
having a microarray of biological polymers carried on 
the substrate surface. Subsection A describes a multi- 
cell substrate, each cell of which contains a 
5 microarray, and preferably an identical microarray, of 
distinct biopolymers, such as distinct polynucleotides, 
formed on a porous surface. Subsection B describes a 
microarray of distinct polynucleotides bound on a glass 
slide coated with a polycationic polymer. 

10 

A. Multi-Cell Substrate 

Fig. 9 illustrates, in plan view, a substrate 110 
constructed according to the invention. The substrate 
has an 8 X 12 rectangular array 112 of cells, such as 

15 cells 114, 116, formed on the substrate surface. With 
reference to Fig. 10, each cell, such as cell 114, in 
turn supports a microarray 118 of distinct biopolymers, 
such as polypeptides or polynucleotides at known, 
addressable regions of the microarray. Two such 

20 regions forming the microarray are indicated at 120, 

and correspond to regions, such as regions 42, forming 
the microarray of distinct biopolymers shown in Fig. 3. 

The 96-cell array shown in Fig. 9 has typically 
array dimensions between about 12 and 244 mm in width 

25 and 8 and 400 mm in length, with the cells in the array 
having width and length dimension of 1/12 and 1/8 the 
array width and length dimensions, respectively, i.e., 
between about 1 and 20 in width and 1 and 50 mm in 
length. 

30 The construction of substrate is shown cross- 

sect ionally in Fig. 11, which is an enlarged sectional 
view taken along view line 124 in Fig. 9. The 
substrate includes a water-impermeable backing 126, 
such as a glass slide or rigid polymer sheet. Formed 

35 on the surface of the backing is a water-permeable film 
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128. The film is formed of a porous membrane material, 
such as nitrocellulose membrane, or a porous web 
material, such as a nylon, polypropylene, or PVDF 
porous polymer material. The thickness of the film is 
5 preferably between about 10 and 1000 /xm. The film may 
be applied to the backing by spraying or coating 
uncured material on the backing, or by applying a 
preformed membrane to the backing. The backing and 
film may be obtained as a preformed unit from 

10 commercial source, e.g., a plastic-backed 

nitrocellulose film available from Schleicher and 
Schuell Corporation. 

With continued reference to Fig. 11, the film- 
covered surface in the sxibstrate is partitioned into a 

15 desired array of cells by water-impermeable grid lines, 
such as lines 130, 132, which have infiltrated the film 
down to the level of the backing, and extend above the 
surface of the film as shown, typically a distance of 
100 to 2000 ^m above the film surface. 

20 The grid lines are formed on the substrate by 

laying down an uncured or otherwise f lowable resin or 
elastomer solution in an array grid, allowing the 
material to infiltrate the porous film down to the 
backing, then curing or otherwise hardening the grid 

25 lines to form the cell-arxay substrate. 

One preferred material for the grid is a f lowable 
silicone available from Loctite Corporation. The 
barrier material can be extruded through a narrow 
syringe (e.g., 22 gauge) using air pressure or 

30 mechanical pressure. The syringe is moved relative to 
the solid support to print the barrier elements as a 
grid pattern. The extruded bead of silicone wicks into 
the pores of the solid support and cures to form a 
shallow waterproof barrier separating the regions of 

35 the solid support. 
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In alternative embodimer.ts , the barrier element 
can be a wax-based material or a thermoset material 
such as epoxy. The barrier material can also be a UV- 
curing polymer which is exposed to UV light after being 
5 printed onto the solid support. The barrier material 
may also be applied to the solid support using printing 
techniques such as silk-screen printing. The barrier 
material may also be a heat-seal stamping of the porous 
solid support which seals its pores and forms a water- 

10 impervious barrier element. The barrier material may 
also be a shallow grid which is laminated or otherwise 
adhered to the solid support. 

In addition to plastic-backed nitrocellulose, the 
solid support can be virtually any porous membrane with 

15 or without a non-porous backing. Such membranes are 
readily available from numerous vendors and are made 
from nylon, PVDF, polysulfone and the like. In an 
alternative embodiment, the barrier element may also be 
used to adhere the porous membrane to a non-porous 

20 backing in addition to functioning as a barrier to 
prevent cross contamination of the assay reagents. 

In an alternative embodiment, the solid support 
can be of a non-porous material. The barrier can be 
printed either before or after the microarray of 

25 biomolecules is printed on the solid support. 

As can be appreciated, the cells formed by the 
grid lines and the underlying backing are water- 
impermeable, having side barriers projecting above the 
porous film in the cells* Thus, def ined-volinne samples 

30 can be placed in each well without risk of cross- 

contguaination with szonple material in adjacent cells. 
In Fig. 11, defined volumes samples, such as sample 
134, are shown in the cells. 

As noted above, each well contains a microarray of 

35 distinct biopolymers. In one general embodiment, the 
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microarrays in the well are identical arrays of 
distinct biopolymers, e.gr-, different sequence 
polynucleotides. Such arrays can be formed in 
accordance with the methods described in Section II, by 
5 depositing a first selected polynucleotide at the same 
selected microarray position in each of the cells, then 
depositing a second polynucleotide at a different 
microarray position in each well, and so on until a 
complete, identical microarray is formed in each cell. 

10 In a preferred embodiment, each microarray 

contains about lO' distinct polynucleotide or 
polypeptide biopolymers per surface area of less than 
about 1 cm^. Also in a preferred embodiment, the 
biopolymers in each microarray region are present in a 

15 defined amount between about 0.1 femtomoles and 100 

nanomoles. The ability to form high-density arrays of 
biopolymers, where each region is formed of a well- 
defined amount of deposited material, can be achieved 
in accordance with the microarray-f orming method 

20 described in Section II. 

Also in a preferred embodiments, the biopolymers 
are polynucleotides having lengths of at least about 50 
bp, i.e., substantially longer than oligonucleotides 
which can be formed in high-density arrays by schemes 

25 involving parallel, step-wise polymer synthesis on the 
array stirface. 

In the case of a polynucleotide array, in an assay 
procedure, a small volume of the labeled DNA probe 
mixture in a standard hybridization solution is loaded 

30 onto each cell. The solution will spread to cover the 
entire microarray and stop at the barrier elements. 
The solid support is then incubated in a humid chamber 
at the appropriate temperature as required by the 
assay. 
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Each assay may be conducted in an "open- face" 
format where no further sealing step is required, since 
the hybridization solution will be kept properly 
hydrated by the water vapor in the humid chamber. At 
5 the conclusion of the incubation step, the entire solid 
support containing the numerous microarrays is rinsed 
quickly enough to dilute the assay reagents so that no 
significant cross contamination occurs. The entire 
solid support is then reacted with detection reagents. 

10 if needed and arvalyzed using standard color imetric, 
radioactive or fluorescent detection means. All . 
processing and detection steps are performed 
simultaneously to all of the microarrays on the solid 
support ensuring uniform assay conditions for all of 

15 the microarrays on the solid support. 

B. Glass-Slide Polyn ucleotide Arrav 
Fig. 5 shows a substrate 136 formed according to 
another aspect of the invention, and intended for use 
20 in detecting binding of labeled polynucleotides to one 
or more of a plurality distinct polynucleotides. The 
substrate includes a glass substrate 138 having formed 
on its surface, a coating of a polycat ionic polymer, 
preferably a cat ionic polypeptide, such as poly lysine 
25 or polyarginine. Formed on the polycationic coating is 
a microarray 140 of distinct polynucleotides, each 
localized at known selected array regions, such as, 
regions 142 . 

The slide is coated by placing a uniform-thickness 
30 film of a polycationic polymer, e.g., poly-1- lysine, on 
the surface of a slide and drying the film to form a 
dried coating. The amount of polycationic polymer 
added is sufficient to form at least a monolayer of 
polymers on the glass surface. The polymer film is 
35 bound to surface via electrostatic binding between 
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negative silyl~OH groups n the surface and charged 
amine groups in the polymers. Poly-l-lysine coated 
glass slides may be obtained commercially, e.g., from 
Sigma Chemical Co. (St. Louis, MO) . 
5 To form the microarray, defined volumes of 

distinct polynucleotides are deposited on the polymer- 
coated slide, as described in Section II. According to 
an important feature of the substrate, the deposited 
polynucleotides remain bound to the coated slide 

10 surface non-covalently when an aqueous DNA sample is 
applied to the substrate under conditions which allow 
hybridization of reporter-labeled polynucleotides in 
the sample to complementary-sequence (single-stranded) 
polynucleotides in the substrate array. The method is 

15 illustrated in Examples 1 and 2. 

To illustrate this feature, a substrate of the 
type just described, but having an array of same- 
sequence polynucleotides, was mixed with fluorescent- 
labeled complementary DNA under hybridization 

20 conditions. After washing to remove non-hybridized 
material, the substrate was examined by low-power 
fluorescence microscopy. The array can be visualized 
by the relatively uniform labeling pattern of the array 
regions . 

25 In a preferred embodiment, each microarray 

contains at least 10^ distinct polynucleotide or 
polypeptide biopolymers per surface area of less than 
about 1 cm^. In the embodiment shown in Fig. 5, the 
microarray contains 400 regions in an area of about 16 

30 mm^, or 2.5 x 10^ regions/cm^. Also in a preferred 

embodiment, the polynucleotides in the each microarray 
region are present in a defined amount between about 
0.1 femtomoles and 100 nanomoles in the case of 
polynucleotides. As above, the ability to form high- 
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density arrays of this type^ where each region is 
formed of a well-defined amount of deposited material, 
can be achieved in accordance with the ioaicroarray- 
forming method described in Section II. 
5 Also in a preferred embodiments, the 

polynucleotides have lengths of at least about 50 bp, 
i.e., substantially longer than oligonucleotides which 
can be formed in high-density arrays by various in situ 
synthesis schemes. 

10 

V. Utility 

Microarrays of immobilized nucleic acid sequences 
prepared in accordance with the invention can be used 
for large scale hybridization assays in numerous 

15 genetic applications, including genetic and physical 

mapping of genomes, monitoring of gene expression, DNA 
sequencing, genetic diagnosis, genotyping of organisms, 
and distribution of DNA reagents to researchers. 

For gene mapping, a gene or a cloned DNA fragment 

20 is hybridized to an ordered array of DNA fragments, and 
the identity of the DNA elements applied to the array 
is unambiguously established by the pixel or pattern of 
pixels of the array that are detected. One application 
of such arrays for creating a genetic map is described 

25 by Nelson, et al. (1993). In constructing physical 
maps of the genome, arrays of immobilized cloned DNA 
fragments are hybridized with other cloned DNA 
fragments to establish whether the cloned fragments in 
the probe mixture overlap and are therefore contiguous 

30 to the immobilized clones on the array. For example, 
Lehrach, et al., describe such a process. 

The arrays of immobilized DNA fragments may also 
be used for genetic diagnostics. To illustrate, an 
array containing multiple forms of a mutated gene or 

35 genes can be probed with a labeled mixture of a 
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patient's DNA which will preferentially interact with 
only one of the immobilized versions of the gene. 

The detection of this interaction can lead to a 
medical diagnosis. Arrays of immobilized DNA fragments 
5 can also be used in DNA probe diagnostics. For 

example, the identity of a pathogenic microorganism can 
be established unambiguously by hybridizing a sample of 
the unknown pathogen's DNA to an array containing many 
types of known pathogenic DNA. A similar technique can 

10 also be used for janambiguous genotypirig of any 

organism. Other molecules of genetic interest, such as 
cDNA's and RNA's can be immobilized on the array or 
alternately used as the labeled probe mixture that is 
applied to the array. 

15 In one application, an array of cDNA clones 

representing genes is hybridized with total cDNA from 
an organism to monitor gene expression for research or 
diagnostic purposes. Labeling total cDNA from a normal 
cell with one color f luorophore and total cDNA from a 

20 diseased cell with another color f luorophore and 

simultaneously hybridizing the two cDNA samples to the 
same array of cDNA clones allows for differential gene 
expression to be measured as the ratio of the two 
f luorophore intensities. This two-color experiment can 

25 be used to monitor gene expression in different tissue 
types, disease states, response to drugs, or response 
to environmental factors. & An example of this approach 
is illustrated in Examples 2, described with respect to 
Fig. 8. 

30 By way of example and without implying a 

limitation of scope, such a procedure could be used to 
simultaneously screen many patients against all known 
mutations in a disease gene. This invention could be 
used in the form of, for example, 96 identical 0.9 cm x 

35 2.2 cm microarrays fabricated on a single 12 cm x 18 cm 
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sheet of plastic-backed nitrocellulose where each 
microarray could contain, for example, 100 DNA 
fragments representing all known mutations of a given 
gene. The region of interest from each of the DNA 
5 samples from 96 patients could be amplified, labeled, 
and hybridized to the 96 individual arrays with each 
assay performed in 100 microliters of hybridization 
solution. The approximately 1 thick silicone rubber 
barrier elements between individual arrays prevent 

10 cross contamination of the patient samples by sealing 
the pores of the nitrocellulose and by acting as a 
physical barrier between each microarray. The solid 
support containing all 96 microarrays assayed with the 
96 patient samples is incubated, rinsed, detected and 

15 analyzed as a single sheet of material using standard 
radioactive, fluorescent, or color imetric detection 
means (Maniatas, et al., 1989). Previously, such a 
procedure would involve the handling, processing and 
tracking of 96 separate membranes in 96 separate sealed 

20 chambers. By processing all 96 arrays as a single 

sheet of material, significant time and cost savings 
are possible. 

The assay format can be reversed where the patient 
or organism's DNA is immobilized as the array elements 

25 and each array is hybridized with a different mutated 
allele or genetic marker. The gridded solid support 
can also be used for parallel non-DNA ELISA assays. 
Furthermore, the invention allows for the use of all 
standard detection methods without the need to remove 

30 the shallow barrier elements to carry out the detection 
step . 

In addition to the genetic applications listed 
above, arrays of whole cells, peptides, enzymes, 
antibodies, antigens, receptors, ligands, 
35 ph spholipids, polymers, drug cogener preparations or 
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described in this invention for large scale screening 
assays in medical diagnostics, drug discovery, 
molecular biology, immunology and toxicology. 
5 The multi-cell substrate aspect of the invention 

allows for the rapid and convenient screening of many 
DNA probes against many ordered arrays of DNA 
fragments. This eliminates the need to handle and 
detect many individual arrays for performing mass 
10 screenings for genetic research and diagnostic 

applications. Nximerous microarrays can be fabricated 
on the scune solid support and each microarray reacted 
with a different DNA probe while the solid support is 
processed as a single sheet of material. 

15 

The following examples illustrate, but in no way 
are intended to limit, the present invention. 

Examnle 1 

20 Genomic-Complexitv Hvbridization to Micro 

DNA Arravs Representing the Yeast 
Saccharomyces cerevisiae Genome with 
Two-color Fluorescent Detection 

The array elements were randomly amplified PGR 

25 (Bohlander, ©t al., 1992) products using physically 

mapped lambda clones of S. cerevisiae genomic DNA 

templates (Riles, et al., 1993). The PGR was performed 

directly on the lambda phage lysates resulting in an 

amplification of both the 35 kb lambda vector and the 

30 5-15 kb yeast insert sequences in the form of a uniform 

distribution of PGR product between 250-1500 base pairs 

in length. The PGR product was purified using 

Sephadex G50 gel filtration (Pharmacia, Piscataway, NJ) 

and concentrated by evaporation to dryness at room 

35 temperature overnight. Each of the 864 amplified 
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lambda clones was rehydrated in 15 ^1 of 3 x SSC in 
preparation for spotting onto the glass. 

The micro arrays were fabricated on microscope 
slides which were coated with a layer of poly-l-lysine 
5 (Sigma) • The automated apparatus described in Section 
IV loaded 1 /il of the concentrated lambda clone PCR 
product in 3 X sSC directly from 96 well storage plates 
into the open capillary printing element and deposited 
-5 nl of sample per slide at 380 micron spacing between 

10 spots, on each of 40 slides. The process was repeated 
for all 864 samples and 8 control spots. After the 
spotting operation was complete, the slides were 
rehydrated in a himid chamber for 2 hours, baked in a 
dry 80** vacuum oven for 2 hours, rinsed to remove un- 

15 absorbed DNA emd then treated with succinic anhydride 
to reduce non-specific adsorption of the labeled 
hybridization probe to the poly-l-lysine coated glass 
surface. Immediately prior to use, the immobilized DNA 
on the array was denatured in distilled water at 90'' 

20 for 2 minutes. 

For the pooled chromosome experiment, the 16 
chromosomes of Saccharomyces cerevisiae were separated 
in a CHEF agarose gel apparatus (Biorad, Richmond, CA) . 
The six largest chromosomes were isolated in one gel 

25 slice and the smallest 10 chromosomes in a second gel 
slice. The DNA was recovered using a gel extraction 
kit (Qiagen, Chatsworth, CA) . The two chromosome pools 
were randomly amplified in a manner similar to that 
used for the target lambda clones . Following 

30 amplification, 5 micrograms of each of the amplified 

chromosome pools were separately random-primer labeled 
using Klenow polymerase (Amersham, Arlington Heights, 
IL) with a lissamine conjugated nucleotide analog 
(Dupont NEN, Boston, MA) for the pool containing the 

35 six largest chromosomes, and with a fluorescein 
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conjugated nucleotide analog (BMB).for the pool 
containing smallest ten chromosomes. The two pools 
were mixed and concentrated using an ultrafiltration 
device (Amicon, Danvers, MA) • 
5 Five micrograms of the hybridization probe 

consisting of both chromosome pools in 7.5 /xl of TE was 
denatured in a boiling water bath and then snap cooled 
on ice. 2.5 fil of concentrated hybridization solution 
(5 X SSC and 0.1% SDS) was added and all 10 ^1 

10 transferred to the array surface, covered with a cover 
slip, placed in a custom-built single-slide hximidity 
chamber and incubated at 60** for 12 hours. The slides 
were then rinsed at room temperature in 0.1 x ssc and 
0.1%SDS for 5 minutes, cover slipped and scanned. 

15 A custom built laser fluorescent scanner was used 

to detect the two-color hybridization signals from the 
1.8 X 1.8 cm array at 20 micron resolution. The 
scanned image was gridded and analyzed using custom 
image analysis software. After correcting for optical 

20 crosstalk between the fluorophores due to their 
overlapping emission spectra, the red and green 
hybridization values for each clone on the array were 
correlated to the known physical map position of the 
clone resulting in a computer-generated color karyotype 

25 of the yeast genome. 

Figure 6 shows the hybridization pattern of the 
two chromosome pools. A red signal indicates that the 
lambda clone on the array surface contains a cloned 
genomic DNA segment from one of the largest six yeast 

30 chromosomes. A green signal indicates that the lambda 
clone insert comes from one of the smallest ten yeast 
chromosomes. Orange signals indicate repetitive 
sequences which cross hybridized to both chromosome 
pools. Control spots on the array confirm that the 

35 hybridization is specific and reproducible. 
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The physical map locations of the genomic DNA 
fragments contained in each of the clones used as array 
elements have been previously determined by Olson and 
co-workers (Riles, et al.) allowing for the automatic 
5 generation of the color karyotype shown in Figure 1\ 
The color of a chromosomal section on the karyotype 
corresponds to the color of the array element 
containing the clone from that section. The black 
regions of the karyotype represent false nega'tive dark 

10 spots on the array (10%) or regions of the genome not 
covered by the Olson clone library (90%) . Note that 
the largest six chromosomes are mainly red while the 
smallest ten chromosomes are mainly green matching the 
original CHEF gel isola-bion of the hybridization probe. 

15 Areas of the red chromosomes containing green spots and 
vice-versa are probably due to spurious scunple tracking 
errors in the formation of the original library and in 
the emplif ication and spotting procedures. 

The yeast genome arrays have also been probed with 

20 individual clones or pools of clones that are 

f luorescently labeled for physical mapping purposes. 
The hybridization signals of these clones to the array 
were translated into a position on the physical map of 
yeast . 

25 

Example 2 

Total cDNA Hybridized to Micro Arravs of 
cDNA Clones with Two-Color 
Fluorescent Detection 

30 24 clones containing cDNA inserts from the plant 

Arabidopsis were amplified using PCR. Salt was added 
to the purified PCR products to a final concentration 
of 3 X SSC. The cDNA clones were spotted on poly-1- 
lysine coated microscope slides in a manner similar to 

35 Example 1. Among the cDNA clones was a clone 
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representing a transcription factor HAT 4, which had 
previously been used to create a transgenic line of the 
plant Arabidopsis, in which this gene is present at ten 
times the level found in wild-type Arabidopsis (Schena, 
5 et al. , 1992) . 

Total poly-A mRNA from wild type Arabidopsis was 
isolated using standard methods (Maniatis, et al., 
1989) and reverse transcribed into total cDNA, using 
fluorescein nucleotide analog to label the cDNA product 

10 (green fluorescence) . A similar procedure was 

performed with the transgenic line of Arabidopsis where 
the transcription factor HAT4 was inserted into the 
genome using standard gene transfer protocols. cDNA 
copies of mRNA from the transgenic plant are labeled 

15 with a lissamine nucleotide analog (red fluorescence) . 
Two micrograms of the cDNA products from each type of 
plant were pooled together and hybridized to the cDNA 
clone array in a 10 microliter hybridization reaction 
in a manner similar to Example 1. Rinsing and 

20 detection of hybridization was also performed in a 

manner similar to Example 1. Fig. 8 show the resulting 
hybridization pattern of the array. 

Genes equally expressed in wild type and the 
transgenic Arabidopsis appeared yellow due to equal 

25 contributions of the green and red fluorescence to the 
final signal. The dots are different intensities of 
yellow indicating various levels of gene expression. 
The CDNA clone representing the transcription factor 
HAT4, expressed in the transgenic line of Arabidopsis 

30 but not detectably expressed in wild type Arabidopsis^ 
appears as a red dot (with the arrow pointing to it) , 
indicating the preferential expression of the 
transcription factor in the red-labeled transgenic 
Arabidopsis and the relative lack of expression of the 
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transcription factor in the green-labeled wild type 
Arabidopsis . 

An advantage of the microarray hybridization 
format for gene expression studies is the high partial 
5 concentration of each cDNA species achievable in the 10 
microliter hybridization reaction. This high partial 
concentration allows for detection of rare transcripts 
without the need for PGR amplification of the 
hybridization probe which may bias the true genetic 

10 representation of each discrete cDNA species. 

Gene expression studies such as these can be used 
for genomics research to discover which genes are 
expressed in which cell types, disease states, 
development states or environmental conditions. Gene 

15 expression studies can also be used for diagnosis of 
disease by empirically correlating gene expression 
patterns to disease states. 

Example 3 

20 Multiplexed CoJ-orimetric Hybridization on 

a Gridded Solid Support 

A sheet of plastic-backed nitrocellulose was 

gridded with barrier elements made from silicone rubber 

according to the description in Section IV-A. The 

25 sheet was soaked in 10 x SSC and allowed to dzry. As 

shown in Fig. 12, 192 M13 clones each with a different 
yeast inserts were earrayed 400 microns apart in four 
quadrants of the solid support using the automated 
device described in Section III. The bottom left 

30 quadrant served as a negative control for hybridization 
while each of the other three quadrants was hybridized 
simultaneously with a different oligonucleotide using 
the open-face hybridization technology described in 
Section IV-A. The first two and last four elements of 
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each array are positive controls for the colorimetric 
detection step. 

The oligonucleotides were labeled with fluorescein 
which was detected using an anti-f luorescein antibody 
5 conjugated to alkaline phosphatase that precipitated an 
NBT/BCIP dye on the solid support (Ainersham) . Perfect 
matches between the labeled oligos and the M13 clones 
resulted in dark spots visible to the naked eye and 
detected using an optical scanner (HP ScanJet II) 

10 attached to a personal computer. The hybridization 
patterns are different in every quadrant indicating 
that each oligo found several unique M13 clones from 
among the 192 with a perfect sequence match. Note that 
the open capillary printing tip leaves detectable 

15 dimples on the nitrocellulose which can be used to 
automatically align and analyze the images. 

Although the invention has been described with 
respect to specific embodiments and methods, it will be 
20 clear that various changes and modification may be made 
without departing from the invention . 
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IT IS CLAIMED: 



1. A method of forming a microarray of analyte- 
assay regions on a solid support, where each region in 
5 the array has a known amount of a selected, analyte- 
specific reagent, said method comprising, 

(a) loading a solution of a selected analyte- 
specific reagent in a reagent-dispensing device having 
an elongate capillary channel (i) formed by spaced- 

10 apart, coextensive elongate members, (ii) adapted to 
hold a c[uantity of the reagent solution and (iii) 
having a tip region at which aqueous solution in the 
channel forms a meniscus, 

(b) tapping the tip of the dispensing device 

15 against a solid support at a defined position on the 
surface, with an impulse effective to break the 
meniscus in the capillary channel and deposit a 
selected volxune of solution on the surface, and 

(c) repeating steps (a) and (b) until said array 
20 is formed. 



2. The method of claim 1, wherein said tapping is 
carried out with an impulse effective to deposit a 
selected volume in the volume range between 0.01 to 100 

25 nl. 

3. The method of claim 1, wherein said channel is 
formed by a pair of spaced-apart tapered elements. 

30 4. The method of claim 1, for forming a plurality 

of such arrays, wherein step (b) is applied to a 
selected position on each of a plurality of solid 
supports at each repeat cycle proceeding step (c) . 
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5. The method of claim 1, which further includes, 
after performing steps (a) and (b) at least one time, 
reloading the reagent-dispensing device with a new 
reagent solution by the steps of (i) dipping the 
5 capillary channel of the device in a wash solution, 
(ii) removing wash solution drawn into the capillary 
channel, and (iii) dipping the capillary channel into 
the new reagent solution. 

10 6. Automated apparatus for forming a microarray 

of analyte-assay regions on a plurality of solid 
supports, where each region in the array has a known 
amount of a selected, analyte-specif ic reagent, said 
apparatus comprising 

15 (a) a holder for holding, at known positions, a 

plurality of planar supports, 

(b) a reagent dispensing device having ah open 
capillary channel (i) formed by spaced-apart , 
coextensive elongate members (ii) adapted to hold a 

20 quantity of the reagent solution and (iii) having a tip 
region at which aqueous solution in the channel forms a 
meniscus, 

(c) positioning means for positioning the 
dispensing device at a selected array position with 

25 respect to a support in said holder, 

(d) dispensing means for moving the device into 
tapping engagement against a support with a selected 
impulse, when the device is positioned at a defined 
array position with respect to that support, with an 

30 impulse effective to break the meniscus of liquid in 
the capillary channel and deposit a selected volume of 
solution on the surface, and 

(e) control means for controlling said positioning 
and dispensing means. 



35 
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7. The apparatus of claim 6, wherein said 
dispensing means is effective to move said dispensing 
device against a support with an impulse effective to 
deposit a selected volume in the volume range between 

5 0.01 to 100 nl. 

8. The apparatus of claim 6, wherein said channel 
is formed by a pair of spaced-apart tapered elements. 

10 9. The apparatus of claim 6, wherein the control 

means operates to (i) place the dispensing device at a 
loading station, (ii) move the capillary channel in the 
device into a selected reagent at the loading station, 
to load the dispensing device with the reagent, and 

15 (iii) dispense the reagent at a defined array position 
on each of the supports on said holder. 

10. The apparatus of claim 6, wherein the control 
device further operates, at the end of a dispensing 

20 cycle, to wash the dispensing device by (i) placing the 
dispensing device at a washing station, (11) moving the 
capillary chsmnel in the device into a wash fluid, to 
load the dispensing device with the fluid, and (iii) 
remove the wash fluid prior to loading the dispensing 

25 device with a fresh selected reagent. 

11. The apparatus of claim 6, wherein said device 
is one of a plurality of such devices which are carried 
on the arm for dispensing different analyte assay 

30 reagents at selected spaced array positions. 

12. A substrate with a surface having a 
microarray of at least 10^ distinct polynucleotide or 
polypeptide biopolymers per 1 cm^ surface area, each 
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distinct biopolymer sample (i) being disposed at a 
separate, defined position in said array, (ii) having a 
length of at least 50 subunits, and (iii) being present 
in a defined amount between about 0.1 femtomole and 100 
5 nanomoles, 

13. The substrate of claim 12, wherein said 
surface is glass slide coated with poly lysine, and said 
biopolymers are polynucleotides. 

14. The substrate of claim 12, wherein said 
substrate has a water-impermeable backing, a water- 
permeable film formed on the backing, and a grid formed 
on the film, where said grid (i) is composed of 
intersecting water-impervious grid elements extending 
from said backing to positions raised above the surface 
of said film, and (ii) partitions the film into a 
plurality of water-impervious cells, where each cell 
contains such a biopolymer array. 

15 . A substrate with a surface array of sample- 
receiving cells, comprising 

a water- impeormeable backing, 

a water-permeable film formed on the backing, and 
a grid formed on the film, said grid being composed of 
intersecting water- impervious grid elements extending 
from said backing to positions raised above the surface 
of said film. 

30 16. The substrate of claim 15, wherein the cells 

of the array each contain an array of biopolymers. 

17. A substrate for use in detecting binding of 
labeled biopolymers to one or more of a plurality 
35 distinct polynucleotides, comprising 



10 



15 



20 



25 



wo 95/35505 PCT/US95/07659 



43 

a non-porous, glass substrate, 

a coating of a cat ionic polymer on said substrate, 

and 

an array of distinct polynucleotides to said 
5 coating, where each biopolyiner is disposed at a 
separate, defined position in a surface array of 
biopolymers. 

18. A method of detecting differential expression 

10 of each of a plurality of genes in a first cell type 
with respect to expression of the same genes in a 
second cell types, said method comprising 

producing fluorescence- labeled cDNA's from mHNA's 
isolated from the two cells types, where the cDNA's 

15 from the first and second cells are labeled with first 
and second different fluorescent reporters, 

adding a mixture of the labeled cDNA's from the 
two cell types to an array of polynucleotides 
representing a plurality of taiown genes derived from 

20 the two cell types, under conditions that result in 

hybridization of the cDNA's to complementary-secpience 
polynucleotides in the array; and 

examining the array by fluorescence under 
fluorescence excitation conditions in which (i) 

25 polynucleotides in the array that are hybridized 

predominantly to cDNA's derived from one of the first 
and second cell types give a distinct first or second 
fluorescence emission color, respectively, and (ii) 
polynucleotides in the array that are hybridized to 

30 substantially equal nuiabers of cDNA's derived from the 
first and second cell types give a distinct coiabined 
fluorescence emission color, respectively, 

wherein the relative expression of known genes in 
the two cell types can be determined by the observed 

35 fluorescence emission color of each spot. 
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19. The method of claim 18, wherein the array of 
polynucleotides is formed on a substrate with a surface 
having an array of at least 10^ distinct polynucleotide 
or polypeptide biopolymers in a surface area of less 

5 than about 1 cm^ each distinct biopolymer (i) being 

disposed at a separate, defined position in said array, 
(ii) having a length of at least 50 subunits, and (iii) 
being present in a defined amount between about .1 
femtomole and 100 nmoles. 

0 

20. The method of claim 19, wherein said surface 
is a glass slide coated with poly lysine, and said 
biopolymers are polynucleotides non-covalently bound to 
said poly lysine. 
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ABSTRACT cDNA microarray technology is used to profile 
complex diseases and discover novel disease-related genes. In 
inflammatory disease such as rheumatoid arthritis, expression 
patterns of diverse cell types contribute to the pathology. We 
have monitored gene expression in this disease state with a 
microarray of selected human genes of probable significance in 
inflammation as well as with genes expressed in peripheral 
human blood cells. Messenger RNA from cultured macrophages, 
chondrocyte cell lines, primary chondrocytes, and synoviocytes 
provided expression profiles for the selected cytokines, chemo- 
kines, DNA binding proteins, and matrix-degrading metal- 
loproteinases. Comparisons between tissue samples of rheuma- 
toid arthritis and inflammatory bowel disease verified the in- 
volvement of many genes and revealed novel participation of the 
cytokine interleukin 3, chemokine Groa and the metal- 
loproteinase matrix metallo-elastase in both diseases. From the 
peripheral blood library, tissue inhibitor of metalloproteinase 1, 
ferritin light chain, and manganese superoxide dismutase genes 
were identified as expressed differentially in rheumatoid arthri- 
tis compared with inflammatory bowel disease. These results 
successfully demonstrate the use of the cDNA microarray system 
as a general approach for dissecting human diseases. 



The recently described cDNA microarray or DNA-chip tech- 
nology allows expression monitoring of hundreds and thou- 
sands of genes simultaneously and provides a format for 
identifying genes as well as changes in their activity (1, 2). 
Using this technology, two-color fluorescence patterns of 
differential gene expression in the root versus the shoot tissue 
of Arabidopsis were obtained in a specific array of 48 genes (1). 
In another study using a 1000 gene array from a human 
peripheral blood library, novel genes expressed by T cells were 
identified upon heat shock and protein kinase C activation (3). 

The technology uses cDNA sequences or cDNA inserts of a 
library for PGR amplification that are arrayed on a glass slide with 
high speed robotics at a density of 1000 cDNA sequences per cm^. 
These microarrays serve as gene targets for hybridization to 
cDNA probes prepared from RNA samples of cells or tissues. A 
two-color fluorescence labeling technique is used in the prepa- 
ration of the cDNA probes such that a simultaneous hybridization 
but separate detection of signals provides the comparative anal- 
ysis and the relative abundance of specific genes expressed (1, 2). 
Microarrays can be constructed from specific cDNA clones of 
interest, a cDNA library, or a select number of open reading 
frames from a genome sequencing database to allow a large-scale 
functional analysis of expressed sequences. 
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Because of the wide spectrum of genes and endogenous 
mediators involved, the microarray technology is well suited 
for analyzing chronic diseases. In rheumatoid arthritis (RA), 
inflammation of the joint is caused by the gene products of 
many different cell types present in the synovium and cartilage 
tissues plus those infiltrating from the circulating blood. The 
autoimmune and inflammatory nature of the disease is a 
cumulative result of genetic susceptibility factors and multiple 
responses, paracrine and autocrine in nature, from macro- 
phages, T cells, plasma cells, neutrophils, synovial fibroblasts, 
chondrocytes, etc. Growth factors, inflammatory cytokines 
(4), and the chemokines (5) are the important mediators of this 
inflammatory process. The ensuing destruction of the cartilage 
and bone by the invading synovial tissue includes the. actions 
of prostaglandins and leukotrienes (6), and the matrix degrad- 
ing metalloproteinases (MMPs). The MMPs are an irnportant 
class of Zn-dependent metallo-endoproteinases that can col- 
lectively degrade.the proteoglycan and collagen components of 
the connective tissue matrix (7). 

This paper presents a study in which the involvement of 
select classes of molecules in R A was examined. Also inves- 
tigated were 1000 human genes randomly selected from a 
peripheral human blood cell library. Their differential and 
quantitative expression analysis in cells of the joint tissue, in 
diseased RA tissue and in inflammatory bowel disease (IBD) 
tissues was conducted to demonstrate the utility of the mi- 
croarray method to analyze complex diseases by their pattern 
of gene expression. Such a survey provides insight not only into 
the underlying cause of the pathology, but also provides the 
opportunity to selectively target genes for disease intervention 
by appropriate drug development and gene therapies. 

METHODS 

Microarray Design, Development, and Preparation. Two ap- 
proaches for the fabrication of cDNA microarrays were used in 
this study. In the first approach, known human genes of probable 
significance in RA were identified. Regions of the clones, pref- 
erably 1 kb in length, were selected by their proximity to the 3' end 
of the cDNA and for areas of least identity to related and 
repetitive sequences. Primers were synthesized to amplify the 
target regions by standard PGR protocols (3). Products were 



Abbreviations: RA, rheumatoid arthritis; MMP, matrix-degrading 
metalloproteinase; IBD, inflammatory bowel disease; LPS, lipopoly- 
saccharide; PMA, phorbol 12-myristate 13-acetate; TNF-a, tumor 
necrosis factor a; IL, interleukin; TGF-)3, transforming growth factor 
j3; GCSF, granulocyte colony-stimulating factor; MIP, macrophage 
inflammatory protein; MIF, migration inhibitory factor; HME, human 
matrix metallo-elastase; RANTCS, regulated upon activation, normal 
T cell expressed and secreted; Gel, gelatinase; VCAM, vascular cell 
adhesion molecule; ICE, lL-1 converting enzyme; PUMP, putative 
metalloproteinase; MnSOD, manganese superoxide dismutase; TIMP, 
tissue inhibitor of metalloproteinase; MCP, macrophage chemotactic 
protein. 
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verified by gel electrophoresis and purified with Qiaquick 96-well 
purification kit (Qiagen, Chatsworth, CA), lyophilized (Savant), 
and resuspended in 5 pel of 3 X standard saline citrate (SSC) buffer 
for arraying. In the second approach, the microarray containing 
the 1056 human genes from the peripheral blood lymphocyte 
library was prepared as described (3). 

Tissue Specimens. Rheumatoid synovial tissue was obtained 
from patients with late stage classic RA undergoing remedial 
synovectomy or arthroplasty of the knee. Synovial tissue was 
separated from any associated connective tissue or fat. One 
gram of each synovial specimen was subjected to RNA extrac- 
tion within 40 min of surgical excision, or explants were 
cultured in serum-free medium to examine any changes under 
in vitro conditions. For IBD, specimens of macroscopically 
inflamed lower intestinal mucosa were obtained from patients 
with Crohn disease undergoing remedial surgery. The hyper- 
trophied mucosal tissue was separated from underlying con- 
nective tissue and extracted for RNA. 

Cultured Cells. The Mono Mac-6 (MM6) monocytic cells 
(8) were grown in RPMI medium. Human chondrosarcoma 
SW1353 cells, primary human chondrocytes, and synoviocytes 
(9, 10) were cultured in DMEM; all culture media were 
supplemented with 10% fetal bovine serum, 100 yi%/m\ strep- 
tomycin, and 500 units/ml penicillin. Treatment of cells with 
lipopoly saccharide (LPS) endotoxin at 30 ng/ml, phorbol 
12-myristate 13-acetate (PMA) at 50 ng/ml, tumor necrosis 
, factor a (TNF-a) at 50 ng/ml, interleukin (IL)-l/3 at 30 ng/ml, 
or transforming growth factor-)3 (TGF-3) at 100 ng/ml is 
described in the figure legends. 



Fluorescent Probe, Hybridization, and Scanning. Isolation of . 
mRNA, probe preparation, and quantitation with Arabidopsis 
control mRNAs was essentially as described (3) except for the 
following minor modification. Following the reverse transcriptase . 
step, the appropriate Cy3- and Cy5-labeled samples were pooled; 
mRNA degraded by heating the sample to 65°C for 10 min with 
the addition of 5 yxl of 0.5M NaOH plus 0.5 ml of 10 mM EDTA. 
The pooled cDNA was purified from unincorporated nucleotides 
by gel filtration in Centri-spin columns (Princeton Separations, 
Adelphia, NJ). Samples were lyophilized and dissolved in 6 /xl of 
hybridization buffer (5X SSC plus 0.2% SDS). Hybridizations, 
washes, scanning, quantitation procedures, and pseudocolor rep- 
resentations of fluorescent images have been described (3). Scans 
for the two fluorescent probes were normalized either to the 
fluorescence intensity of Arabidopsis mRNAs spiked into the 
labeling reactions (see Figs. 2-4) or to the signal intensity of 
/3-actin and glyceraldehyde-3-phosphate dehydrogenase 
(GAPDH; see Fig. 5). 

RESULTS 

Ninety-Six-Gene Microarray Design. The actions of cytokines, 
growth factors, chemokines, transcription factors, MVCPs, pros- 
taglandins, and leukotrienes are well recognized in inflammatory 
disease, particularly RA (11-14). Fig. 1 displays the selected genes 
for this study and also includes control cDNAs of housekeeping 
genes such as j3-actin and GAPDH and genes from Arabidopsis 
for signal normalization and quantitation (row A, columns 1-12). 

Defining Microarray Assay Conditions. Different lengths and 
concentrations of target DNA were tested by arraying PGR- 
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Fig. 1. Ninety-six-element microarray design. The target element name and the corresponding gene are shown in the layout. Some genes have 
more than one target element to guarantee specificity of signal. For TNF the targets represent decreasing lengths of 1, 0.8, 0.6, 0.4, and 0.2 kb from 
left to right. 
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amplified products ranging from 0.2 to 1.2 kb at concentrations 
of 1 ^g/M-l or less. No significant difference in the signal levels was 
observed within this range of target size and only with 0.2-kb 
length was a signal reduced upon an 8-foId dilution of the 1 iig/^X 
sample (data not shown). In this study the average length of the 
targets was 1 kb, with a few exceptions in the range of «»300 bp, 
arrayed at a concentration of 1 /xg/;i,l. Normally one PGR pro- 
vided sufficient material to fabricate up to 1000 microarray targets. 

In considering positional effects in the development of the 
targets for the microarrays, selection was biased toward the 3' 
proximal regions, because the signal was reduced if the target 
fragment was biased toward the 5' end (data not shown). This 
result was anticipated since the hybridizing probe is prepared by 
reverse transcription with oligo(dT)-primed mRNA and is richer 
in y proximal sequences. Cross-hybridizations of probes to 
targets of a gene family were analyzed with the matrix metal- 

A. 

uiiinckiccd 



loproteinases as the example because they can show regions of 
sequence identities of greater than 70%. With collagenase-1 
(Col-1) and collagenase-2 (Col-2) genes as targets with up to 70% 
sequence identity, and stromelysin-1 (Strom- 1) and stromeIysin-2 
(Strom-2) genes with different degrees of identity, our results 
showed that a short region of overlap, even with 70-90% se- 
quence identity, produced a low level of cross-hybridization. 
However, shorter regions of identity spread over the length of the 
target resulted in cross-hybridization (data not shown). For 
closely related genes, targets were designed by avoiding long 
stretches of homology. For members of a gene family two or more 
target regions were included to discriminate between specificity 
of signal versus cross-hybridization. 

Monitoring Differential Expression in Cultured Cell lines. In 
RA tissue, the monocyte/macrophage population plays a prom- 
inent role in phagocytic and immunomodulatory activities. Typ- 
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Fig. 2. Time course for LPS/PMA-induced MM6 cells. Array elements are described in Fig. \.(A) Pseudocolor representations of fluorescent 
scans correspond to gene expression levels at each time point. The array is made up of % Arabidopsis control targets and 86 human cDNA targets, 
the majority of which are genes with known or suspected involvement in inflammation. The color bars provide a comparative calibration scale 
between arrays and are derived from the Arabidopsis mRNA samples that are introduced in equal amounts during probe preparation. Fluorescent 
probes were made by labeling mRNA from untreated MM6 cells or LPS and PMA treated cells. mRNA was isolated at indicated times after 
induction. {B l-llf) The two-color samples were cohybrtdized, and microarray scans provided the data for the levels of select transcripts at different 
time points relative to abundance at time zero. The analysis was performed using normalized data collected from 8-bit images. 
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ically these cells, when triggered by an immunogen, produce the 
proinflammatroy cytokines TNF and IL-L We have used the 
monocyte cell line MM6 and monitored changes in gene expres- 
sion upon activation with LPS endotoxin, a component of Gram- 
negative bacterial membranes, and PMA, which augments the 
action of LPS on TNF production (15). RNA was isolated at 
different times after induction and used for cDNA probe prep- 
aration. From this time course it was clear that TNF expression 
was induced within 15 min of treatment, reached maximum levels 
in 1 hr, remained high until 4 hr and subsequently declined (Fig. 
lA). Many other cytokine genes were also transiently activated, 
such as IL-la and IL-6, and granulocyte colony-stimulating 
factor (GCSF). Prominent chemokines activated were IL-8, mac- 
rophage inflammatory protein (MlP)-lft more so than MlP-la, 
and Groa or melanoma growth stimulatory factor. Migration 
inhibitory factor (MIF) expressed in the un induced state declined 
in LPS-activated cells. Of the immediate early genes, the notice- 
able ones were c-fos,fra-l, c-jun, NF-KBp50, and IkB, with c-rel 
expression observed even in the uninduced state (Fig. 2B). These 
expression patterns are consistent with reported patterns of 
activation of certain LPS- and PMA-induced genes (12). Dem- 
onstrated here is the unique ability of this system to allow parallel 
visualization of a large number of gene activities over a period of 
time. 

SW1353 cells is a line derived from malignant tumors of the 
cartilage and behaves much like the chondrocytes upon stim- 
ulation with TNF and IL-1 in the expression of MMPs (9). In 
addition to confirming our earlier observations with Northern 
blots on Strom-1, Col-1, and Col-3 expression (9), gelatinase 
(Gel) A, putative metalloproteinase (PUMP)-l membrane- 
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type matrix metalloproteinase, tissue inhibitors of matrix 
metalloproteinases or tissue inhibitor of metalloproteinase 1 
(TIMP-1), -2, and -3 were also expressed by these cells together 
with the human matrix metallo-elastase (HME; Fig. 3A). HME 
induction was estimated to be «*50-fold and was greater than 
any of the other MMPs examined (Fig. 3B). This result was 
unexpected because HME is reportedly expressed only by 
alveolar macrophage and placental cells (16). Expression of 
the cytokines and chemokines, IL-6, IL-8, MIF, and MlP-ljS 
was also noted. A variety of other genes, including certain 
transcription factors, were also up-regulated (Fig. 3), but the 
overall time-dependent expression of genes in the SW1353 
cells was qualitatively distinct from the MM6 cells. 

Quantitation of differential gene expression (Figs. 2B and 
3B) was achieved with the simultaneous hybridization of 
Cy3-labeled cDNA from untreated cells and Cy5-labeled 
cDNA from treated samples. The estimated increases in 
expression from these microarrays for a select number of genes 
including IL-lft IL-8, MIP-1/3, TNF, HME, Col-1, Col-3, 
Strom-1, and Strom-2 were compared with data collected from 
dot blot analysis. Results (not shown) were in close agreement 
and confirmed our earlier observations on the use of the 
microarray method for the quantitation of gene expression (3). 

Expression Profiles in Primary Chondrocytes and Synovio- 
cytes of Human RA Tissue. Given the sensitivity and the 
specificity of this method, expression profiles of primary 
synoviocytes and chondrocytes from diseased tissue were 
examined. Without prior exposure to inducing agents, low level 
expression of c-jun, GCSF, IL-3, TNF-)3, MIF, and R ANTES 
(regulated upon activation, normal T cell expressed and < se- 
creted) was seen as well as expression of MMPs, GelA, 
Strom-1, Col-1, and the three TIMPs. In this case, Col-2 
hybridization was considered to be nonspecific because the 
second Col-2 target taken from the 3' end of the gene gave no 

A. Human synovial fibroblasts B. Human articular chondrocytes 
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FiG. 3. Time course for IL-1/3 and TNF-induced SW1353 cells 
using the inflammation array (Fig. 1). (A) Pseudocolor representation 
of fluorescent scans correspond to gene expression levels at each time 
point. (B I-IV) Relative levels of selected genes at different time points 
compared with time zero. 
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Fig. 4. Expression profiles for early passage primary synoviocytes and 
chondrocytes isolated from RA tissue, cultured in the presence of 10% 
fetal calf serum and activated with PMA and IL-1/3, or TNF and IL-1/3, 
or TGF-/3 for 18 hr. The color bars provide a comparative calibration scale 
between arrays and are derived from the Ambidopsis mRNA samples that 
are introduced in equal amounts during probe preparation 
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signal. Treatment more so with PMA and IL-1, than TNF and 
IL-1, produced a dramatic up-regulation in expression of 
several genes in both of these primary cell types. These genes 
are as follows: the cytokine IL-6, the chemokines IL-8 and 
Gro-la, and the MMPs; Strom-1, Col-1, Col-3, and HME; and 
the adhesion molecule, vascular cell adhesion molecule 1 
(VCAM-1). The surprise again is HME expression in these 
primary cells, for reasons discussed above. From these results, 
the expression profiles of synoviocytes and the chondrocytes 
appear very similar; the differences are more quantitative than 
qualitative. Treatment of the primary chondrocytes with the 
anabolic growth factor TGF-P had an interesting profile in that 
it produced a remarkable down-regulation of genes expressed 
in both the untreated and induced state (Fig. 4). 

Given the demonstrated effectiveness of this technology, a 
comparative analysis of two different inflammatory disease 
states was conducted with probes made from RA tissue and 
IBD samples. RA samples were from late stage rheumatoid 
synovial tissue, and IBD specimens were obtained from in- 
flamed lower intestinal mucosa of patients with Crohn disease. 
With both the 96-element known gene microarray and the 
1000-gene microarray of cDNAs selected from a peripheral 
human blood cell library (3), distinct differences in gene 
expression patterns were evident. On the 96-gene array, RA 
tissue samples from different affected individuals gave similar 
profiles (data not shown) as did different samples from the 
same individual (Fig. 5). These patterns were notably similar 
to those observed with primary synoviocytes and chondrocytes 
(Fig. 4). Included in the list of prominently up-regulated genes 
are IL-6, the MMPs Strom-1, Col-1, GelA, HME, and in 
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Fig. 5. Expression profiles of RA tissue (A) and IBD tissue (B). 
mRNA from R A tissue samples obtained from the same individual was 
isolated directly after excision (RA 21. 5A) or maintained in culture 
without serum for 2 hr (RA 21. 5B) or for 6 hr (RA 21. 5C). Profiles 
from tissue samples of two other individuals (data not shown) were 
remarkably similar to the ones shown here. IBD-A and IBD-CI are 
from mRNA samples prepared directly after surgery from two sepa- 
rate individuals. For the IBD-CII probe, the tissue sample was cultured 
in medium without serum for 2 hr before mRNA preparation. 



certain samples PUMP, TIMPs, particularly TIMP-1 and 
TIMP-3, and the adhesion molecule VCAM. Discernible levels 
of macrophage chemotactic protein 1 (MCP-1), MIF and 
RANTES were also noted. IBD samples were in comparison, 
rather subdued although IL-1 converting enzyme (ICE), 
TIMP-1, and MIF were notable in all the three different IBD 
samples examined here. In IBD-A, one of three individual 
samples, ICE, VCAM, Groa, and MMP expression was more 
pronounced than in the others. 

We also made use of a peripheral blood cDNA library (3) 
to identify genes expressed by lymphocytes infiltrating the 
inflamed tissues from the circulating blood. With the 1046- 
element array of randomly selected cDNAs from this library, 
probes made from RA and IBD samples showed hybridizations 
to a large number of genes. Of these, many were common 
between the two disease tissues while others were differentially 
expressed (data not shown). A complete survey of these genes 
was beyond the scope of this study, but for this report we 
picked three genes that were up-regulated in the RA tissue 
relative to IBD. These cDNAs were sequenced and identified 
by comparison to the GenBank database. They are TIMP-1, 
apoferritin light chain, and manganese superoxide dismutase 
(MnSOD). Differential expression of MnSOD was only ob- 
served in samples of R A tissue explants maintained in growth 
medium without serum for anywhere between 2 to 16 hr. These 
results also indicate that the expression, profile of genes can be 
altered when explants are transferred to culture conditions. 

DISCUSSION 

The speed, ease, and feasibility of siinultaneously monitoring 
differential expression of hundreds of genes with the cDNA 
microarray based system (1-3) is demonstrated here in the 
analysis of a complex disease such as RA. Many different cell 
types in the RA tissue; macrophages, lymphocytes, plasma cells, 
neuti-ophils, synoviocytes, chondrocytes, etc. are known to con- 
tribute to the development of the disease with the expression of 
gene products known to be proinflammatory. They include the 
cytokines, chemokines, growth factors, MMPs, eicosanoids, and 
others (7, 11-14), and the design of the 96-element known gene 
microarray was based on this knowledge and depended on the 
availability of the genes. The technology was validated by con- 
firming earlier observations on the expression of TNF by the 
monocyte cell line MM6, and of Col-1 and Col-3 expression in the 
chondrosarcoma cells and articular chondrocytes (9, 12). In our 
time-dependent survey the chronological order of gene activities 
in and between gene families was compared and the results have 
provided unprecedented profiles of the cytokines (TNF, IL-1, 
IL-6, GCSF, and MIF), chemokines (MlP-la, MlP-ljS, IL-8, and 
Gro-1), certain transcription factors, and the matrix metal- 
loproteinases (GelA, Strom-1, CoI-1, Col-3, HME) in the mac- 
rophage cell line MM6 and in the SW1353 chondrosarcoma cells. 

Earlier reports of cytokine production in the diseased state had 
established a model in which TNF is a major participant in RA. 
Its expression reportedly preceded that of the other cytokines and 
effector molecules (4). Our results strongly support these results 
as demonstrated in the time course of the MM6 cells where TNF 
induction preceded that of IL-la and IL-P followed by IL-6 and 
GCSF. These expression profiles demonstrate the utility of the 
microarrays in determining the hierarachy of signaling events. 

In the SW1353 chondrosarcoma cells, all the known MMPs and 
TIMPs were examined simultaneously. HME expression was 
discovered, which previously had been observed in only the 
stromal cells and alveolar macrophages of smoker's lungs and in 
placental tissue. Its presence in cells of the RA tissue is mean- 
ingful because its activity can cause significant destruction of 
elastin and basement membrane components (16, 17). Expression 
profiles of synovial fibroblasts and articular chondrocytes were 
remarkably similar and not too different from the SW1353 cells, 
indicating that the fibroblast and the chondrocyte can play equally 
aggressive roles in joint erosion. Prominent genes expressed were 
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the MMPs, but chemokines and cytokines were also produced by 
these cells. The effect of the anabolic growth factor TGF-)3 was 
profoundly evident in demonstrating the down regulation of these 
catabolic activities. 

RA tissue samples undeniably reflected profiles similar to 
the cell types examined. Active genes observed were IL-3, IL-6, 
ICE, the MMPs including HME and TIMPs, chemokines IL-8, 
Groa, MIP, MIF, and RANTES, and the adhesion molecule 
VCAM. Of the growth factors, fibroblast growth factor ^ was 
observed most frequently. In comparison, the expression 
patterns in the other inflammatory state (i.e., IBD) were not 
as marked as in the R A samples, at least as obtained from the 
tissue samples selected for this study. 

As an alternative approach, the 1046 cDNA microarray of 
randomly selected genes from a lymphocyte library was used to 
identify genes expressed in RA tissue (3). Many genes on this 
array hybridized with probes made from both R A and IBD tissue 
samples. The results are not surprising because inflammatory 
tissue is abundantly supplied with cell types infiltrating from the 
circulating blood, made apparent also by the high levels of 
chemokine expression in RA tissue. Because of the magnitude of 
the effort required to identify all the hybridized genes, we have for 
this report chosen to describe only three differentially expressed 
genes mainly to verify this method of analysis. 

Of the large number of genes observed here, a fair number 
were already known as active participants in inflammatory dis- 
ease. These are TNF, IL-1, IL-6, IL-8, GCSF, RANTES, and 
VCAM. The novel participants not previously reported are 
HME, IL-3, ICE, and Groa. With our discovery of HME 
expression in RA, this gene becomes a target for drug interven- 
tion. ICE is a cysteine' protease well known for its IL-ljS process- 
ing activity (18), and recognized for its role in apoptotic cell death 
(19). Its expression in RA tissue is intriguing, IL-3 is recognized 
for its growth-promoting activity in hematopoietic cell lineages, is 
a product of activated T cells (20), and its expression in synovio- 
cytes and chondrocytes of R A tissue is a novel observation. 

Like IL-8, Groa, is a C-X-C subgroup chemokine and is a 
potent neutrophil and basophil chemoattractant. It down- 
regulates the expression of types I and III interstitial collagens 
(21, 22) and is seen here produced by the MM6 cells, in primary 
synoviocytes, and in R A tissue. With the presence of RANTES, 
MCP, and MIP-1)3, the C-C chemokines (23) migration and 
infiltration of monocytes, particularly T cells, into the tissue is 
also enhanced (5) and aid in the trafficking and recruitment of 
leukocytes into the RA tissue. Their activation, phagocytosis, 
degranulation, and respiratory bursts could be responsible for 
the induction of MnSOD in RA. MnSOD is also induced by 
TNF and IL-1 and serves a protective function against oxida- 
tive damage. The induction of the ferritin light chain encoding 
gene in this tissue may be for reasons similar to those for 
MnSOD. Ferritin is the major intracellular iron storage protein 
and it is responsive to intracellular oxidative stress and reactive 
oxygen intermediates generated during inflammation (24, 25). 
The active expression of TIMP-1 in R A tissue, as detected by 
the 1000-element array, is no surprise because our results have 
repeatedly shown TIMP-1 to be expressed in the constitutive 
and induced states of RA cells and tissues. 

The suitability of the cDNA microarray technology for 
profiling diseases and for identifying disease related genes is 
well documented here. This technology could provide new 



targets for drug development and disease therapies, and in 
doing so allow for improved treatment of chronic diseases that 
are challenging because of their complexity. 
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MEASUREMENT OF GENE EXPRESSION PROFILES 
IN TOXICITY DETERMINATION 

5 Field of the Invention 

The invention relates generally to methods for detecting and monitoring 
phenotypic changes in in vitro and in vivo systems for assessing and/or determining 
the toxicity of chemical compounds, and more particularly, the invention relates to a 
method for detecting and monitoring changes in gene expression patterns in in vitro 
1 0 and in vivo systems for determining the toxicity of drug candidates. 

BACKGROUND 

The ability to rapidly and conveniently assess the toxicity of new compounds 
is extremely important. Thousands of new compounds are synthesized every year, 

1 5 and many are introduced to the environment through the development of new 

commercial products and processes, often with little knowledge of their short term 
and long term heahh effects. In the development of new drugs, the cost of assessing 
the safety and efficacy of candidate compounds is becoming astronomical: It is 
, estimated that the pharmaceutical industry spends an average of about 300 million 

20 dollars to bring a new pharmaceutical compound to market, e.g. Biotechnology, 13; 
226-228 (1995). A large fraction of these costs are due to the failure of candidate 
compounds in the later stages of the developmental process. That is, as the 
assessment of a candidate drug progresses from the identification of a compound as a 
drug candidate--for example, through relatively inexpensive binding assays or in vitro 

25 screening assays, to pharmacokinetic studies, to toxicity studies, to efficacy studies in 
model systems, to preliminary clinical studies, and so on, the costs of the associated 
tests and analyses increases tremendously. Consequently, it may cost several tens of 
millions of dollars to determine that a once promising candidate compound possesses 
a side effect or cross reactivity that renders it commercially infeasible to develop 

30 further. A great challenge of pharmaceutical development is to remove from further 
consideration as early as possible those compounds that are likely to fail in the later 
stages of drug testing. 

Drug development prograrr s are clearly structured with this objective in mind; 
however, rapidly escalating costs have created a need to develop even more stringent 

35 and less expensive screens in the early stages to identify false leads as soon as 

possible. Toxicity assessment is an area where such improvements may be made, for 
both drug development and for assessing the environmental, health, and safety effects 
of new compounds in general. 
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Typically the toxicity of a compound is determined by administering the 
compound to one or more species of test animal under controlled conditions and by 
monitoring the effects on a wide range of parameters. The parameters include such 
things as blood chemistry, weight gain or loss, a variety of behavioral patterns, muscle 
5 tone, body temperature, respiration rate, lethality, and the like, which collectively 
provide a measure of the state of health of the test animal. The degree of deviation of 
such parameters from their normal ranges gives a measure of the toxicity of a 
compound. Such tests may be designed to assess the acute, prolonged, or chronic 
toxicity of a compound. In general, acute tests involve administration of the test 

1 0 chemical on one occasion. The period of observation of the test animals may be as 
short as a few hours, although it is usually at least 24 hours and in some cases it may 
be as long as a week or more. In general, prolonged tests involve administration of 
the test chemical on multiple occasions. The test chemical may be administered one 
or more times each day, irregularly as when it is incorporated in the diet, at specific 

1 5 times such as during pregnancy, or in some cases regularly but only at weekly 

intervals. Also, in the prolonged test the experiment is usually conducted for not less 
than 90 days in the rat or mouse or a year in the dog. In contrast to the acute and 
prolonged types of test, the chronic toxicity tests are those in which the test chemical 
Js administered for a substantial portion of the lifetime of the test animal. In the case 

20 of the mouse or rat, this is a period of 2 to 3 years. In the case of the dog, it is for 5 to 
7 years. 

Significant costs are incurred in establishing and maintaining large cohorts of 
test animals for such assays, especially the larger animals in chronic toxicity assays. 
Moreover, because of species specific effects, passing such toxicity tests does not 

25 ensure that a compound is free of toxic effects when used in humans. Such tests do, 
however, provide a standardized set of information forjudging the safety of new 
compounds, and they provide a database for giving preliminary assessments of related 
compounds. An important area for improving toxicity determination would be the 
identification of new observables which are predictive of the outcome of the 

30 expensive and tedious animal assays. 

In other medical fields, there has been significant interest in applying recent 
advances in biotechnology, particularly in DNA sequencing, to the identification and 
study of differentially expressed genes in healthy and diseased organisms, e.g. Adams 
et al, Science, 252: I65M656 (1991); Matsubaraet al, Gene, 135:265-274(1993); 

3 5 Rosenberg et al. International patent application, PCT/US95/0 1 863 . The objectives 
of such applications include increasing our knowledge of disease processes, 
identifying genes that play important roles in the disease process, and providing 
diagnostic and therapeutic approaches that exploit the expressed genes or their 
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products. While such approaches are attractive, those based on exhaustive, or even 
sampled, sequencing ofexpressed genes are still beset by the enormous effort 
required: It is estimated that 30-35 thousand different genes are expressed in a typical 
mammalian tissue in any given state, e.g. Ausubel et al, Editors, Current Protocols, 
5 5.8.1-5.8.4 (John Wiley & Sons, New York, 1992). Determining the sequences of 
even a small sample of that number of gene products is a major enterprise, requiring 
industrial-scale resources. Thus, the routine application of massive sequencing of 
expressed genes is still beyond current commercial technology. 

The availability of new assays for assessing the toxicity of compounds, such 
] 0 as candidate drugs, that would provide more comprehensive and precise information 
about the state of health of a test animal would be highly desirable. Such additional 
assays would preferably be less expensive, more rapid, and more convenient than 
current testing procedures, and would at the same time provide enough information to 
make early judgments regarding the safety of new compounds. 

15 

Summary of the Invention 
An object of the invention is to provide a new approach to toxicity assessment 
based on an examination of gene expression patterns, or profiles, in in vitro or in vivo 
.test systems. 

20 Another object of the invention is to provide a database on which to base 

decisions concerning the toxicological properties of chemicals, particularly drug 
candidates. 

A further object of the invention is to provide a method for analyzing gene 
expression patterns in selected tissues of test animals. 
25 A still further object of the invention is to provide a system for identifying 

genes which are differentially expressed in response to exposure to a test compound. 

Another object of the invention is to provide a rapid and reliable method for 
correlating gene expression with short term and long term toxicity in test animals. 

Another object of the invention is to identify genes whose expression is 
30 predictive of deleterious toxicity. 

The invention achieves these and other objects by providing a method for 
massively parallel signature sequencing of genes expressed in one or more selected 
tissues of an organism exposed to a test compound. An important feature of the 
invention is the application of novel DNA sorting and sequencing methodologies that 
35 permit the formation of gene expression profiles for selected tissues by determining 
the sequence of portions of many thousands of different polynucleotides in parallel. 
Such profiles may be compared vn\h those from tissues of control organisms at single 
or multiple time points to identify expression pattems predictive of toxicity. 
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The sorting methodology of the invention makes use of oligonucleotide lags 
that are members of a minimally cross-hybridizing set of oligonucleotides. The 
sequences of oligonucleotides of such a set differ from the sequences of every other 
member of the same set by at least two nucleotides. Thus, each member of such a set 
5 cannot form a duplex (or triplex) with the complement of any other member with less 
than two mismatches. Complements of oligonucleotide tags of the invention, referred 
to herein as '^tag complements," may comprise natural nucleotides or non-natural 
nucleotide analogs. Preferably, tag complements are anached to solid phase supports. 
Such oligonucleotide tags when used with their corresponding tag complements 
10 provide a means of enhancing specificity of hybridization for sorting polynucleotides, 
such as cDNAs. 

The polynucleotides to be sorted each have an oligonucleotide tag attached, 
such that different polynucleotides have different tags. As explained more fiilly 
below, this condition is achieved by employing a repertoire of tags substantially 
1 5 greater than the population of polynucleotides and by taking a sufficiently small 
sample of tagged polynucleotides from the full ensemble of tagged polynucleotides. 
After such sampling, when the populations of supports and polynucleotides are mixed 
under conditions which permit specific hybridization of the oligonucleotide tags with 
-their respective complements, identical polynucleotides sort onto particular beads or 
20 regions. The sorted populations of polynucleotides can then be sequenced on the 

solid phase support by a "single-base" or "base-by-base" sequencing methodology, as 
described more fiilly below. 

In one aspect, the method of the invention comprises the following steps: (a) 
administering the compound to a test organism; (b) extracting a population of mRNA 
25 molecules from each of one or more tissues of the test organism; (c) forming a 

separate population of cDNA molecules from each population of mRNA molecules 
extracted from the one or more tissues such that each cDNA molecule of the separate 
populations has an oligonucleotide tag attached, the oligonucleotide tags being 
selected from the same minimally cross-hybridizing set; (d) separately sampling each 
30 population of cDNA molecules such that substantially all different cDNA molecules 
within a separate population have different oligonucleotide tags attached; (e) sorting 
the cDNA molecules of each separate population by specifically hybridizing the 
oligonucleotide tags with their respective complements, the respective complements 
being attached as uniform populations of substantially identical complements in 
35 spatially discrete regions on one or more solid phase supports; (0 determining the 
nucleotide sequence of a portion of each of the sorted cDNA molecules of each 
separate population to form a frequency distribution of expressed genes for each of 
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the one or more tissues; and (g) correlating the frequency distribution of expressed 
genes in each of the one or more tissues with the toxicity of the compound. 

An important aspect of the invention is the identification of genes whose 
expression is predictive of the toxicity of a compound. Once such genes are 
5 identified, they may be employed in conventional assays, such as reverse transcriptase 
polymerase chain reaction (RT-PCR) assays for gene expression. 

Brief Description of the Drawings 
Figure 1 is a flow chart representation of an algorithm for generating 
1 0 minimally cross-hybridizing sets of oligonucleotides. 

Figiure 2 diagrammatically illustrates an apparatus for carrying out 
polynucleotide sequencing in accordance with the invention. 

Definitions 

1 5 "Complement" or "tag complement" as used herein in reference to 

oligonucleotide tags refers to an oligonucleotide to which a oligonucleotide tag 
specifically hybridizes to form a perfectly matched duplex or triplex. In embodiments 
where specific hybridization results in a triplex, the oligonucleotide tag may be 
.selected to be either double stranded or single stranded. Thus, where triplexes are 

20 formed, the term "complement" is meant to encompass either a double stranded 

complement of a single stranded oligonucleotide tag or a single stranded complement 
of a double stranded oligonucleotide tag. 

The term "oligonucleotide" as used herein includes linear oligomers of natural 
or modified monomers or linkages, including deoxyribonucleosides, ribonucleosides, 

25 anomeric forms thereof, peptide nucleic acids (PNAs), and the like, capable of 
specifically binding to a target polynucleotide by way of a regular pattern of 
monomer-to-monomer interactions, such as Watson-Crick type of base pairing, base 
stacking, Hoogsteen or reverse Hoogsteen types of base pairing, or the like. Usually 
monomers are linked by phosphodiester bonds or analogs thereof to form 

30 oligonucleotides ranging in size fi-om a few monomeric units, e.g. 3-4, to several tens 
of monomeric units. Whenever an oligonucleotide is represented by a sequence of 
letters, such as "ATGCCTG," it will be understood that the nucleotides are in 5'->3' 
order from left to right and that "A" denotes deoxyadenosine, "C" denotes 
deoxycytidine, "G" denotes deoxyguanosine, and "T" denotes thymidine, unless 

35 otherwise noted. Analogs of phosphodiester linkages include phosphorothioate, 
phosphorodithioate, phosphoranilidate, phosphoramidate, and the like. Usually 
oligonucleotides of the invention comprise the four natural nucleotides; however, they 
may also comprise non-natural nucleotide analogs. It is clear to those skilled in the 
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art when oligonucleotides having natural or non-natural nucleotides may be 
employed, e.g. where processing by enzymes is called for, usually oligonucleotides 
consisting of natural nucleotides are required. 

"Perfectly matched" in reference to a duplex means that the poly- or 
5 oligonucleotide strands making up the duplex form a double stranded structure with 
one other such that every nucleotide in each strand undergoes Watson-Crick 
basepairing with a nucleotide in the other strand. The term also comprehends the 
pairing of nucleoside analogs, such as deoxyinosine, nucleosides with 2-aminopurine 
bases, and the like, that may be employed. In reference to a triplex, the term means 
1 0 that the triplex consists of a perfectly matched duplex and a third strand in which 
every nucleotide undergoes Hoogsteen or reverse Hoogsteen association with a 
basepair of the perfectly matched duplex. Conversely, a "mismatch" in a duplex 
between a tag and an oligonucleotide means that a pair or triplet of nucleotides in the 
duplex or triplex fails to undergo Watson-Crick and/or Hoogsteen and/or reverse 

1 5 Hoogsteen bonding. 

As used herein, "nucleoside" includes the natural nucleosides, including 2'- 
deoxy and 2'-hydroxyl forms, e.g. as described in Komberg and Baker, DNA 
Replication, 2nd Ed. (Freeman, San Francisco, 1992). "Analogs" in reference to 
. nucleosides includes synthetic nucleosides having modified base moieties and/or 

20 modified sugar moieties, e.g. described by Scheit, Nucleotide Analogs (John Wiley, 
New York, 1980); Uhlman and Peyman, Chemical Reviews, 90: 543-584 (1990). or 
the like, with the only proviso that they are capable of specific hybridization. Such 
analogs include synthetic nucleosides designed to enhance binding properties, reduce 
complexity, increase specificity, and the like. 

25 As used herein "sequence determination" or "determining a nucleotide 

sequence" in reference to polynucleotides includes determination of partial as well as 
full sequence information of the polynucleotide. That is, the term includes sequence 
comparisons, fingerprinting, and like levels of information about a target 
polynucleotide, as well as the express identification and ordering of nucleosides, 

30 usually each nucleoside, in a target polynucleotide. The term also includes the 

determination of the identification, ordering, and locations of one, two, or three of the 
four types of nucleotides within a target polynucleotide. For example, in some 
embodiments sequence determination may be effected by identifying the ordering and 
locations of a single type of nucleotide, e.g. cytosines, within the target polynucleotide 

35 "CATCGC ..." so that its sequence is represented as a binary code, e.g. "100101 ... " for 
"C-(not C)-(not C).C-(not C)-C ... " and the like. 
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As used herein, the term "complexity" in reference to a population of 
polynucleotides means .the number of different species of molecule present in the 
population. 

As used herein, the terms "gene expression profile," and "gene expression 
5 pattem" which is used equivalently, means a frequency distribution of sequences of 
portions of cDNA molecules sampled from a population of tag-cDNA conjugates. 
Generally, the portions of sequence are sufficiently long to uniquely identify the 
cDNA from which the portion arose. Preferably, the total number of sequences 
determined is at least 1000; more preferably, the total number of sequences 
1 0 determined in a gene expression profile is at least ten thousand. 

As used herein, "test organism" means any in vitro or in vivo system which 
provides measureable responses to exposure to test compounds. Typically, test 
organisms may be mammalian cell cultures, particularly of specific tissues, such as 
hepalocytes, neurons, kidney cells, colony forming cells, or the like, or test organisms 
1 5 may be whole animals, such as rats, mice, hamsters, guinea pigs, dogs, cats, rabbits, 
pigs, monkeys, and the like. 

Detailed Description of the Invention 
The invention provides a method for determining the toxicity of a compound 

20 by analyzing changes in the gene expression profiles in selected tissues of test 
organisms exposed to the compound. The invention also provides a method of 
identifying toxicity markers consisting of individual genes or a group of genes that is 
expressed acutely and which is correlated with prolonged or chronic toxicity, or 
suggests that the compound will have an undesirable cross reactivit>'. Gene 

25 expression profiles are generated by sequencing portions of cDNA molecules 

construction from mRNA extracted from tissues of test organisms exposed to the 
compound being tested. As used herein, the term "tissue" is employed with its usual 
medical or biological meaning, except that in reference to an in vitro test system, such 
as a cell culture, it simply means a sample from the culture. Gene expression profiles 

30 derived from test organisms are compared to gene expression profiles derived from 
control organisms to determine the genes which are differentially expressed in the test 
organism because of exposure to the compound being tested. In both cases, the 
sequence information of the gene expression profiles is obtained by massively parallel 
signature sequencing of cDNAs, which is implemented in steps (c) through (0 of the 

35 above method. 

Toxicity Assessment 
Procedures for designing and conducting toxicity tests in in vitro and in vivo 
systems is well known, and is described in many texts on the subject, such as Loomis 
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et al. Loomis's Esstentials of Toxicology, 4th Ed. (Academic Press, New York, 1996); 
Echobichon, The Basics of Toxicity Testing (CRC Press, Boca Raton, 1992); Frazier, 
editor, In Vitro Toxicity Testing (Marcel Dekker, New York, 1992); and the like. 

In toxicity testing, two groups of test organisms are usually employed: one 
group serves as a control and the other group receives the test compound in a single 
dose (for acute toxicity tests) or a regimen of doses (for prolonged or chronic toxicity 
tests). Since in most cases, the extraction of tissue as called for in the method of the 
invention requires sacrificing the test animal, both the control group and the group 
receiving compound must be large enough to permit removal of animals for sampling 
tissues, if it is desired to observe the dynamics of gene expression through the 
duration of an experiment. 

In setting up a toxicity study, extensive guidance is provided in the literature 
for selecting the appropriate test organism for the compound being tested, route of 
administration, dose ranges, and the like. Water or physiological saline (0.9% NaCl 
in water) is the solute of choice for the test compound since these solvents permit 
administration by a variety of routes. When this is not possible because of solubility 
limitations, it is necessary to resort to the use of vegetable oils such as com oil or 
even organic solvents, of which propylene glycol is commonly used. Whenever 
.possible the use of suspension of emulsion should be avoided except for oral 
administration. Regardless of the route of administration, the volume required to 
administer a given dose is limited by the size of the animal that is used. It is desirable 
to keep the volume of each dose uniform within and between groups of animals. 
When rates or mice are used the volume administered by the oral route should not 
exceed 0.005 ml per gram of animal. Even when aqueous or physiological saline 
solutions are used for parenteral injection the volumes that are tolerated are limited, 
although such solutions are ordinarily thought of as being innocuous. The 
intravenous LD50 of distilled water in the mouse is approximately 0.044 ml per gram 
and that of isotonic saline is 0.068 ml per gram of mouse. 

When a compound is to be administered by inhalation, special techniques for 
generating test atmospheres are necessary. Dose estimation becomes very 
complicated. The methods usually involve aerosolization or nebulization of fluids 
containing the compound. If the agent to be tested is a fluid that has an appreciable 
vapor pressure, it may be administered by passing air through the solution under 
controlled temperature conditions. Under these conditions, dose is estimated from the 
volume of air inhaled per unit time, the temperature of the solution, and the vapor 
pressure of the agent involved. Gases are metered from reservoirs. When particles of 
a solution are to be administered, unless the particle size is less than about 2 ^im the 
particles will not reach the terminal alveolar sacs in the lungs. A variety of 
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apparatuses and chambers are available to perform studies for detecting effects of 
irritant or other toxic endpoints when they are administered by inhalation. The 
preferred method of administering an agent to animals is via the oral route, either by 
intubation or by incorporating the agent in the feed. 
5 Preferably, in designing a toxicity assessment, two or more species should be 

employed that handle the test compoimd as similarly to man as possible in terms of 
metabolism, absorption, excretion, tissue storage, and the like. Preferably, multiple 
doses or regimens at different concentrations should be employed to establish a dose- 
response relationship with respect to toxic effects. And preferably, the route of 

1 0 administration to the test animal should be the same as, or as similar as possible to, 
the route of administration of the compound to man. Effects obtained by one route of 
administration to test animals are not a priori applicable to effects by another route of 
administration to man. For example, food additives for man should be tested by 
admixture of the material in the diet of the test animals. 

1 5 Acute toxicity tests consist of administering a compound to test organisms on 

one occasion. The purpose of such test is to determine the symptomotology 
consequent to administration of the compoimd and to determine the degree of lethality 
of the compound. The initial procedure is to perform a series of range-finding doses 
-of the compound in a single species. This necessitates selection of a route of 

20 administration, preparation of the compound in a form suitable for administration by 
the selected route, and selection of an appropriate species. Preferably, initial acute 
toxicity studies are performed on either rats or mice because of their low cost, their 
availability, and the availability of abundant toxicologic reference data on these 
species. Prolonged toxicity tests consist of administering a compound to test 

25 organisms repeatedly, usually on a daily basis, over a period of 3 to 4 months. Two 
practical factors are encountered that place constraints on the design of such tests: 
First, the available routes of administration are limited because the route selected 
must be suitable for repeated administration without inducing harmful effects. And 
second, blood,, urine, and perhaps other samples, should be taken repeatedly without 

30 inducing significant harm to the test animals. Preferably, in the method of the 
' invention the gene expression profiles are obtained in conjunction with the 
measurement of the traditional toxicologic parameters, such as listed in the table 
below: 
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Hematology 



Blood Chemistry 



Urine Analyses 



erythrocyte count 
total leukocyte count 
differential leukocyte 
count 
hematocrit 
hemoglobin 



sodium 
potassium 
chloride 

calcium 

carbon dioxide 

serum glutamine-pyruvate 

transaminase 

serum glutamin-oxalacetic 

transaminase 

serum protein 

electrophoresis 

blood sugar 

blood urea nitrogen 

total serum protein 

serum albumin 

total serum bilirubin 



PH 

specific gravity 
total protein 

sediment 

glucose 

ketones 

bilirubin 



5 ^ Oligonucleotide Ta^s and Tag Complements 

Oligonucleotide tags are members of a minimally cross-hybridizing set of 
oligonucleotides. The sequences of oligonucleotides of such a set differ from the 
sequences of every other member of the same set by at least two nucleotides. Thus, 
each member of such a set cannot form a duplex (or triplex) with the complement of 

10 any other member with less than two mismatches. Complements of oligonucleotide 
tags, referred to herein as ''tag complements;' may comprise natural nucleotides or 
non-natural nucleotide analogs. Preferably, tag complements are attached to solid 
phase supports. Such oligonucleotide tags when used with their corresponding tag 
complements provide a means of enhancing specificity of hybridization for sorting, 

1 5 tracking, or labeling molecules, especially polynucleotides. 

Minimally cross-hybridizing sets of oligonucleotide tags and tag complements 
may be synthesized either combinatorial ly or individually depending on the size of the 
set desired and the degree to which cross-hybridization is sought to be minimized (or 
stated another way, the degree to which specificity is sought to be enhanced). For 

20 example, a minimally cross-hybridizing set may consist of a set of individually 

synthesized 10-mer sequences that differ from each other by at least 4 nucleotides, 
such set having a maximum size of 332 (when composed of 3 kinds of nucleotides 
and counted using a computer program such as disclosed in Appendix Ic). 
Alternatively, a minimally cross-hybridizing set of oligonucleotide tags may also be 
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assembled combinatorial ly from subunits which themselves are selected from a 
minimally cross-hybridizing set. For example, a set of minimally cross-hybridizing 
12-mers differing from one another by at least three nucleotides may be synthesized 
by assembling 3 subunits selected from a set of minimally cross-hybridizing 4-mers 
5 that each differ from one another by three nucleotides. Such an embodiment gives a 
maximally sized set of 9^, or 729, 12-mers. The number 9 is number of 
oligonucleotides listed by the computer program of Appendix la, which assumes, as 
with the 10-mers, that only 3 of the 4 different types of nucleotides are used. The set 
is described as "maximal" because the computer programs of Appendices la-c provide 

1 0 the largest set for a given input (e.g. length, composition, difference in number of 
nucleotides between members). Additional minimally cross-hybridizing sets may be 
formed from subsets of such calculated sets. 

Oligonucleotide tags may be single stranded and be designed for specific 
hybridization to single stranded tag complements by duplex formation or for specific 

1 5 hybridization to double stranded tag complements by triplex formation. 

Oligonucleotide tags may also be double stranded and be designed for specific 
hybridization to single stranded tag complements by triplex formation. 

When synthesized combinatorially, an oligonucleotide tag preferably consists 
.of a plurality of subunits, each subunit consisting of an oligonucleotide of 3 to 9 

20 nucleotides in length wherein each subunit is selected from the same minimally cross- 
hybridizing set. In such embodiments, the number of oligonucleotide tags available 
depends on the number of subunits per tag and on the length of the subunits. The 
number is generally much less than the number of all possible sequences the length of 
the tag, which for a tag n nucleotides long would be 4^. 

25 Complements of oligonucleotide tags attached to a solid phase support are 

used to sort polynucleotides from a mixture of polynucleotides each containing a tag. 
Complements of the oligonucleotide tags are synthesized on the surface of a solid 
phase support, such as a microscopic bead or a specific location on an array of 
synthesis locations on a single support, such that populations of identical sequences 

30 are produced in specific regions. That is, the surface of each support, in the case of a 
bead, or of each region, in the case of an array, is derivatized by only one type of 
complement which has a particular sequence. The population of such beads or regions 
contains a repertoire of complements with distinct sequences. As used herein in 
reference to oligonucleotide tags and tag complements, the term ''repertoire" means 

35 the set of minimally cross-hybridizing set of oligonucleotides that make up the tags in 
a particular embodiment or the corresponding set of tag complements. 

The polynucleotides to be sorted each have an oligonucleotide tag attached, 
such that different polynucleotides have different tags. As explained more fully 
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10 



below, this condition is achieved by employing a repertoire of tags substantially 
greater than the population of polynucleotides and by taking a sufficiently small 
sample of tagged polynucleotides from the full ensemble of tagged polynucleotides. 
After such sampling, when the populations of supports and polynucleotides are mixed 
under conditions which permit specific hybridization of the oligonucleotide tags with 
their respective complements, identical polynucleotides sort onto particular beads or 
regions. 

The nucleotide sequences of oligonucleotides of a minimally cross-hybridizing 
set are conveniently enumerated by simple computer programs, such as those 
exemplified by programs whose source codes are listed in Appendices la and lb. 
Program minhx of Appendix la computes all minimally cross-hybridizing sets having 
4-mer subunits composed of three kinds of nucleotides. Program lagN of Appendix 
lb enumerates longer oligonucleotides of a minimally cross-hybridizing set. Similar 
algorithms and computer programs are readily written for listing oligonucleotides of 
1 5 minimally cross-hybridizing sets for any embodiment of the invention. Table 1 below 
provides guidance as to the size of sets of minimally cross-hybridizing 
oligonucleotides for the indicated lengths and number of nucleotide differences. The 
above computer programs were used to generate the numbers. 

20 Table I 

Nucleotide 
Difference 

between Maximal Size 

Oligonucleotides of Minimally Size of 

Oligonucleolid of Minimally Cross- Repertoire Size of 

^ ^'■oss- Hybridizing with Four Repertoire with 

Hybridizing Set Set Words Five Words 

Length , 



4 


3 


9 


6561 


5.90 X lo" 


6 


3 


27 


5.3 X lO-'' 


1.43 X lo"^ 


7 


4 


27 


5.3 X lo' 


. 1.43x10^ 


7 


5. 


8 


4096 


3.28 X 10** 


8 


3 


190 


1.30 X 10^ 


2.48 X lo" 


8 


4 


62 


1.48 X 10^ 


9.16 X 10^ 


8 


5 


18 


1.05 X 10^ 


1.89 X 10^ 


9 


5 


39 


2.31 X 10^ 


9.02 X 10^ 


10 


5 


332 


L21xI0'<* 




10 


6 


28 


6.15 X 10^ 


1.72 X 10^ 


I) 


5 


187 - 






18 


6 


«25000 
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)g 12 24 

For some embodiments of the invention, where extremely large repertoires of 
tags are not required, oligonucleotide tags of a minimally cross-hybridizing set may 
be separately synthesized. Sets containing several hundred to several thousands, or 
5 even several tens of thousands, of oligonucleotides may be synthesized directly by a 
variety of parallel synthesis approaches, e.g. as disclosed in Frank et al, U.S. patent 
4,689,405; Frank et al. Nucleic Acids Research, 1 1 : 4365-4377 (1983); Matson et al, 
Anal. Biochem., 224: 110-1 16 (1995); Fodor et al. International application 
PCTAJS93/04145; Pease et al, Proc. Natl. Acad. Sci., 91 : 5022-5026 (1994); 

1 0 Southern et al, J. Biotechnology, 35:21 7-227 (1 994), Brennan, International 

application PCT/US94/05896; Lashkari et al, Proc. Natl. Acad. Sci., 92: 7912-791 5 
(1995); or the like. 

Preferably, oligonucleotide tags of the invention are synthesized 
combinatorially out of subunits between three and six nucleotides in length and 

1 5 selected from the same minimally cross-hybridizing set. For oligonucletides in this 
range, the members of such sets may be enumerated by computer programs based on 
the algorithm of Fig. 1. 

The algorithm of Fig. 1 is implemented by first defining the characteristics of 
the subunits of the minimally cross-hybridizing set, i.e. length, number of base 

20 differences between members, and composition, e.g. do they consist of two, three, or 
four kinds of bases. A table M^, n=l, is generated (100) that consists of all possible 
sequences of a given length and composition. An initial subunit S i is selected and 
compared (120) with successive subunits Sj for i=n+l to the end of the table. 
Whenever a successive subunit has the required number of mismatches to be a 

25 member of the minimally cross-hybridizing set, it is saved in a new table Mp+j (125), 
that also contains subunits previously selected in pirior passes through step 1 20. For 
example, in the first set of comparisons, M2 will contain S j ; in the second set of 
comparisons, M3 will contain Sj and S2; in the third set of comparisons, M4 will 
contain Sj, S2, and S3; and so on. Similarly, comparisons in table Mj will be 

30 between Sj and all successive subunits in Mj. Note that each successive table M^+i 
is smaller than its predecessors as subunits are eliminated in successive passes 
through step 1 30. After every subunit of table has been compared (140) the old 
table is replaced by the new table Mj^+i , iand the next round of comparisons are 
begun. The process stops (160) when a table is reached that contains no 

35 successive subunits to compare to the selected subunit Sj, i.e. M^-Mn+i . 

Preferably, minimally cross-hybridizing sets comprise subunits that make 
approximately equivalent contributions to duplex stability as every other subunit in 
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the set. In this way, the stability of perfectly matched duplexes between every subunit 
and its complement is approximately equal Guidance for selecting such sets is 
provided by published techniques for selecting optimal PGR primers and calculating 
duplex stabilities, e.g. Rychlik et al. Nucleic Acids Research, 17: 8543-8551 (1989) 
5 and 18: 6409-6412 (1990); Bresiauer et al, Proc. Natl. Acad. Sci., 83: 3746-3750 
(1986); Wetmur, Crit. Rev. Biochem. Mol. Biol., 26: 227-259 (1991);and the like. 
For shorter tags, e.g. about 30 nucleotides or less, the algorithm described by Rychlik 
and Wetmur is preferred, and for longer tags, e.g. about 30-35 nucleotides or greater, 
an algorithm disclosed by Suggs et al, pages 683-693 in Brown, editor, ICN-UCLA 

10 Symp. Dev. Biol., Vol. 23 (Academic Press, New York, 1981) may be conveniently 
employed. Clearly, the are many approaches available to one skilled in the art for 
designing sets of minimally cross-hybridizing subunits within the scope of the 
invention. For example, to minimize the affects of different base-stacking energies of 
terminal nucleotides when subunits are assembled, subunits may be provided that 

1 5 have the same terminal nucleotides. In this way, when subunits are linked, the sum of 
the base-stacking energies of all the adjoining terminal nucleotides will be the same, 
thereby reducing or eliminating variability in tag melting temperatures. 

A "word" of terminal nucleotides, shown in italic below, may also be added to 
. each end of a tag so that a perfect match is always formed between it and a similar 

20 terminal ' Vord" on any other tag complement. Such an augmented tag would have 
the form: 





w, 


W2 ... Wk.i 




w 


w 


W,' 


W ... w,v 




w 



where the primed W's indicate complements. With ends of tags always forming 
5 perfectly matched duplexes, all mismatched words will be internal mismatches 

thereby reducing the stability of tag-complement duplexes that otherwise would have 
mismatched words at their ends. It is well known that duplexes with internal 
mismatches are significantly less stable than duplexes with the same mismatch at a 
terminus. 

0 A preferred embodiment of minimally cross-hybridizing sets are those whose 

subunits are made up of three of the four natural nucleotides. As will be discussed 
more fully below, the absence of one type of nucleotide in the oligonucleotide tags 
permits target polynucleotides to be loaded onto solid phase supports by use of the 
5'->3* exonuclease activity of a DNA polymerase. The following is an exemplary 

5 minimally cross-hybridizing set of subunits each comprising four nucleotides selected 
from the group consisting of A, G, and T: 
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Table II 

Word: wj W2 W3 W4 

Sequence: GATT TGAT TAGA TTTG 



Word: 

Sequence : 



W5 



W6 



W7 



GTAA ACTA ATGT AAAG 



In this set, each member would form a duplex having three mismatched bases with 
1 0 the complement of every other member. 

Further exempl^ minimally cross-hybridizing sets are listed below in Table 
III. Clearly, additional sets can be generated by substituting different groups of 
'nucleotides, or by using subsets of known minimally cross-hybridizing sets. 
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Table III 

Exemplary Minimally Cross-Hvbridizing Sets of 4-mer Subunits 



Set 1 


Set 2 


Set 3 


Set 4 


Set 5 


Set 6 


CATT 


ACCC 


AAAC 


AAAG 


AACA 


AACG 


CTAA 


AGGG 


■ ACCA 


ACCA 


ACAC 


ACAA 


TCAT 


CACG 


AGGG 


AGGC 


AGGG 


AGGC 


ACTA 


CCGA 


CACG 


CACC 


CAAG 


CAAC 


TACA 


CGAC 


CCGC 


CCGG 


CCGC 


CCGG 


TTTC 


GAGC 


CGAA 


CGAA 


CGCA 


CGCA 


ATCT 


GCAG 


GAGA 


GAGA 


GAGA 


GAGA 


AAAC 


GGCA 


GCAG 


GCAC 


GCCG 


GCCC 




AAAA 


GGCC 


GGCG 


GGAC 


GGAG 
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Set / 


Set 8 


Set 9 


Set 10 


Set 11 


Set 12 


AAGA 


AAGC 


AAGG 


ACAG 


ACCG 


ACGA 


ACAC 


ACAA 


ACAA 


AACA 


AAAA 


AAAC 


AGCG 


AGCG 


AGCC 


AGGC 


AGGC 


AGCG 




CAAG 


CAAC 


CAAC 


CACC 


CACA 


CCCA 


CCCC 


CCCG 


CCGA 


CCGA 


CCAG 


CGGC 


CGGA 


CGGA 


CGCG. 


CGAG 


CGGC 


GACC 


GACA 


GACA 


GAGG 


GAGG 


GAGG 


GCGG 


GCGG 


GCGC 


GCCC 


GCAC 


GCCC 


GGAA 


GGAC 


GGAG 


GGAA 


GGCA 


GGAA 



The oligonucleotide tags of the invention and their complements are 
conveniently synthesized on an automated DNA synthesizer, e.g, an Applied 
Biosystems, Inc. (Foster City, California) model 392 or 394 DNA/RNA Synthesizer, 
5 using standard chemistries, such as phosphoramidite chemistry, e.g. disclosed in the 
following references: Beaucage and Iyer, Tetrahedron, 48: 2223-23 11 (1 992); Molko 
et al, U.S. patent 4,980,460; Koster et al, U.S. patent 4,725,677; Caruthers et al, U.S. 
patents 4,415,732; 4,458,066; and 4,973,679; and the like. Alternative chemistries, 
e.g. resulting in non-natural backbone groups, such as phosphorothioate, 

1 0 phosphoramidate, and the like, may also be employed provided that the resulting 
.oligonucleotides are capable of specific hybridization. In some embodiments, tags 
may comprise naturally occurring nucleotides that permit processing or manipulation 
by enzymes, while the corresponding tag complements may comprise non-natural 
nucleotide analogs, such as peptide nucleic acids, or like compounds, that promote the 

1 5 formation of more stable duplexes during sorting. 

When microparticles are used as supports, repertoires of oligonucleotide tags 
and tag complements may be generated by subunit-wise synthesis via "split and mix" 
techniques, e.g. as disclosed in Shortle et al. International patent application 
PCT/US93/03418 or Lyttle et al, Biotechniques, 19: 274-280 (1995). Briefly, the 

20 basic unit of the synthesis is a subunit of the oligonucleotide tag. Preferably, 
phosphoramidite chemistry is used and 3' phosphoramidite oligonucleotides are 
prepared for each subunit in a minimally cross-hybridizing set, e.g. for the set first 
listed above, there would be eight 4-mer 3'-phosphoramidites. Synthesis proceeds as 
disclosed by Shortle et al or in direct analogy with the techniques employed to 

25 generate diverse oligonucleotide libraries using nucleosidic monomers, e.g. as 

disclosed in Telenius et al. Genomics, 13: 718-725 (1992); Welsh et al, Nucleic Acids 
Research, 19: 5275-5279 (1991); Grothues et al, Nucleic Acids Research, 21: 1321- 
1322 (1993); Hartley, European patent application 90304496.4; Lam et al, Nature, 
354: 82-84 (1991); Zuckerman et al, Int. J. Pept. Protein Research, 40: 498-507 

30 (1 992); and the like. Generally, these techniques simply call for the application of 
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mixtures of the activated monomers to the growing oligonucleotide during the 
coupling steps. Preferably, oligonucleotide tags and tag complements are synthesized 
on a DNA synthesizer having a number of synthesis chambers which is greater than or 
equal to the number of different kinds of words used in the construction of the tags. 
5 That is, preferably there is a synthesis chamber corresponding to each type of word. 
In this embodiment, words are added nucleotide-by-nucleotide, such that if a word 
consists of five nucleotides there are five monomer couplings in each synthesis 
chamber. After a word is completely synthesized, the synthesis supports are removed 
from the chambers, mixed, and redistributed back to the chambers for the next cycle 

1 0 of word addition. This latter embodiment takes advantage of the high coupling yields 
of monomer addition, e.g. in phosphoramidite chemistries. 

Double stranded forms of tags may be made by separately synthesizing the 
complementary strands followed by mixing under conditions that permit duplex 
formation. Alternatively, double stranded tags may be formed by first synthesizing a 

1 5 single stranded repertoire linked to a known oligonucleotide sequence that serves as a 
primer binding site. The second strand is then synthesized by combining the single 
stranded repertoire with a primer and extending with a polymerase. This latter 
approach is described in Oliphant et al, Gene, 44: 177-183 (1986). Such duplex tags 
. may then be inserted into cloning vectors along with target polynucleotides for sorting 

20 and manipulation of the target polynucleotide in accordance with the invention. 

When tag complements are employed that are made up of nucleotides that 
have enhanced binding characteristics, such as PNAs or oligonucleotide N3'->P5' 
phosphoramidates, sorting can be implemented through the formation of D-loops 
between tags comprising natural nucleotides and their PNA or phosphoramidate 

25 complements, as an alternative to the '^stripping" reaction employing the 3'^5' 
exonuclease activity of a DNA polymerase to render a tag single stranded. 

Oligonucleotide tags of the invention may range in length from 12 to 60 
nucleotides or basepairs. Preferably, oligonucleotide tags range in length from 1 8 to 
40 nucleotides or basepairs. More preferably, oligonucleotide tags range in length 

30 from 25 to 40 nucleotides or basepairs. In terms of preferred and more preferred 
numbers of subunits, these ranges may be expressed as follows: 



Table IV 

Numbers of Subimits in Tags in Preferred Embodiments 

35 

Monomers 

in Subunit Nucleotides in Oligonucleotide Tag 

(12-60)^ (18-40) (25-40) 
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3 



4-20 subunits 



6-13 subunits 



8-13 subunits 



4 



3-15 subunits 



4-10 subunits 



6-10 subunits 



5 



2-12 subunits 



3-8 subunits 



5-8 subunits 



6 



2-10 subunits 



3-6 subunits 



4-6 subunits 



Most preferably, oligonucleotide tags are single stranded and specific hybridization 
occurs via Watson-Crick pairing with a tag complement. 

Preferably, repertoires of single stranded oligonucleotide tags of the invention 
contain at least 100 members; more preferably, repertoires of such tags contain at 
least 1000 members; and most preferably, repertoires of such tags contain at least 
10,000 members. 



In embodiments where specific hybridization occurs via triplex formation, 
coding of tag sequences follows the same principles as for duplex-forming tags; 
however, there are further constraints on the selection of subunit sequences. 
Generally, third strand association via Hoogsteen type of binding is most stable along 
homopyrimidine-homopurine tracks in a double stranded target. Usually, base triplets 
form in T-A*T or C-G*C motifs (where indicates Watson-Crick pairing and 
indicates Hoogsteen type of binding); however, other motifs are also possible. For 
example, Hoogsteen base pairing permits parallel and antiparallel orientations 
between the third strand (the Hoogsteen strand) and the purine-rich strand of the 
duplex to which the third strand binds, depending on conditions and the composition 
of the strands. There is extensive guidance in the literature for selecting appropriate 
sequences, orientation, conditions, nucleoside type (e.g. whether ribose or 
deoxyribose nucleosides are employed), base modifications (e.g. methylated cytosine. 
and the like) in order to maximize, or otherwise regulate, triplex stability as desired in 
particular embodiments, e.g. Roberts et al, Proc. Natl. Acad. Sci., 88: 9397-9401 
(1991); Roberts et al, Science, 258: 1463-1466 (1992); Roberts et al, Proc. Natl. 
Acad. Sci., 93: 4320-4325 (1996); Distefano et al, Proc. Natl, Acad. Sci., 90: 1 179- 
1 183 (1993); Mergny et al, Biochemistry, 30: 9791-9798 (1991); Cheng et al, J. Am. 
Chem. Soc, 1 14: 4465-4474 (1992); Beal and Dervan, Nucleic Acids Research, 20: 
2773-2776 (1992); Beal and Dervan, J. Am. Chem. Soc, 114: 4976-4982 (1992); 
Giovannangeli et al, Proc. Natl. Acad. Sci., 89: 8631-8635 (1992); Moser and Dervan, 
Science, 238: 645-650 (1987); McShan et al, J. Biol. Chem., 267:5712-5721 (1992); 
Yoon et al, Proc. Natl. Acad. Sci., 89: 3840-3844 (1992); Blume et al. Nucleic Acids 
Research, 20: 1777-1784 (1992); Thuong and Helene, Angew. Chem. Int. Ed. Engl. 



Triplex Taps 
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32: 666-690 (1993); Escude et al, Proc. Natl Acad. Sci., 93: 4365-4369 (1996); and 
the like. Conditions for annealing single-stranded or duplex tags to their single- 
stranded or duplex complements are well known, e.g. Ji et al. Anal. Chem. 65: 1323- 
1328 (1993); Cantor et al, U.S. patent 5,482,836; and the like. Use of triplex tags has 
5 the advantage of not requiring a "stripping" reaction with polymerase to expose the 
tag for annealing to its complement. 

Preferably, oligonucleotide tags of the invention employing triplex 
hybridization are double stranded DNA and the corresponding tag complements are 
single stranded. More preferably, 5-methylcytosine is used in place of cytosine in the 

1 0 tag complements in order to broaden the range of pH stability of the triplex formed 
between a tag and its complement. Preferred conditions for forming triplexes are 
fully disclosed in the above references. Briefly, hybridization takes place in 
concentrated salt solution, e.g. 1 .0 M NaCl, 1 .0 M potassium acetate, or the like, al 
pH below 5.5 ( or 6.5 if 5-methylcytosine is employed). Hybridization temperature 

1 5 depends on the length and composition of the tag; however, for an 1 8-20-mer tag of 
longer, hybridization at room temperature is adequate. Washes may be conducted 
with less concentrated salt solutions, e.g. 10 mM sodium acetate, 100 mM MgCl2, pH 
5.8, at room temperature. Tags may be eluted from their tag complements by 
- incubation in a similar salt solution at pH 9.0. 

20 Minimally cross-hybridizing sets of oligonucleotide tags that form triplexes 

may be generated by the computer program of Appendix Ic, or similar programs. An 
exemplary' set of double stranded 8-mer words are listed below in capital letters with 
the corresponding complements in small letters. Each such word differs from each of 
the other words in the set by three base pairs. 

25 

Table V 

Exemplary Minimally Cross-Hvbridizing 
Set of DoubleStranded 8-mer Tags 





-AAGGAGAG 


5' 


-AAAGGGGA 


5' 


-AGAGAAGA 


c > 
-J 


-AGGGGGGG 


3' 


-TTCCTCTC 


' 3' 


-TTTCCCCT 


3' 


-TCTCTTCT 


3' 


-TCCCCCCC 


3' 


-ttcctctc 


3' 


-tttcccct 


3' 


-tctcttct 


3' 


'tCCCCCCC 


5 ' 


-AAAAAAAA 


5' 


-AAGAGAGA 


c; ' 


-AGGAAAAG 


5' 


-GAAAGGAG 


3' 




3' 


-TTCTCTCT 


3' 


-TCCTTTTC 


3' 


-CTTTCCTC 


3' 


-ttt utttt 


3' 


-ttctctct 


3' 


-tccttttc 


3' 


-ctttcctc 




-AAAAAGGG 


5' 


-AGAAGAGG 


5' 


-AGGAAGGA 


5' 


-GAAGAAGG 


3' 


-TTTTTCCC 


3' 


-TCTTCTCC 


3' 


-TCCTTCCT 


3' 


-CTTCTTCC 


3' 


-tttttccc 


3' 


-tcttctcc 


3' 


-tccttcct 


3' 


-cttctzcz 


5' 


-AAAGGAAG 


5' 


-AGAAGGAA 


5' 


-AGGGGAAA 


5' 


-GAAGAGAA 


3' 


-TTTCCTTC 


3' 


-TCTTCCTT 


3' 


-TCCCCTTT 


3' 


-CTTCTCTT 


3' 


-trtccttc 


3' 


-tcttcctt 


3' 


-tccccttt 


3' 


-cttctctt 
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5 



10 Table VI 

Repertoire Size of Various Double Stranded Tags 
That Form Triplexes with Their Tag Complements 

Nucleotide 
Difference 

between Maximal Size 

Oligonucleotides of Minimally Size of 

Oligonucleotid of Minimally Cross- Repertoire Size of 

^ Cross- Hybridizing with Four Repertoire with 

Hybridizing Set Set Words Five Words 



Length 



4 


2 


' 8 


4096 


3.2 X lO"* 


6 


3 


8 


4096 


3.2 X lO'* 


8 


3 


16 


6.5 X lo"* 


1.05 X 10* 


10 


5 


8 


4096 




15 


5 


92 






20 


6 


765 






20 


8 


92 






20 


10 


22 







1 5 Preferably, repertoires of double stranded oligonucleotide tags of the invention 

contain at least 10 members; more preferably, repertoires of such tags contain at least 
1 00 members. Preferably, words are between 4 and 8 nucleotides in length for 
combinatorially synthesized double stranded oligonucletide tags, and oligonucleotide 
tags are between 1 2 and 60 base pairs in length. More preferably, such tags are 

20 between 1 8 and 40 base pairs in length. 

Solid Phase Supports 
Solid phase supports for use with the invention may have a wide variety of 
forms, including microparticles, beads, and membranes, slides, plates, micromachined 
25 chips, and the like. Likewise, solid phase supports of the invention may comprise a 
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wide variety of compositions, including glass, plastic, silicon, alkanethiolate- 
derivatized gold, cellulose, low cross-linked and high cross-linked polystyrene, silica 
gel, polyamide, and the like. Preferably, either a population of discrete particles are 
employed such that each has a uniform coating, or population, of complementary 
5 sequences of the same tag (and no other), or a single or a few supports are employed 
with spatially discrete regions each containing a uniform coating, or population, of 
complementary sequences to the same tag (and no other). In the latter embodiment, 
the area of the regions may vary according to particular applications; usually, the 
regions range in area from several ^infi, e.g. 3-5, to several hundred [im^, e.g. 100- 
10 500. Preferably, such regions are spatially discrete so that signals generated by 

events, e.g. fluorescent emissions, at adjacent regions can be resolved by the detection 
system being employed. In some applications, it may be desirable to have regions 
with uniform coatings of more than one tag complement, e.g. for simultaneous 
sequence analysis, or for bringing separately tagged molecules into close proximity. 
1 5 Tag complements may be used with the solid phase support that they are 

synthesized on, or they may be separately synthesized and attached to a solid phase 
support for use, e.g. as disclosed by Lund et al. Nucleic Acids Research, 16. 10861- 
10880 (1988); Albretsen et al, Anal. Biochem., 189: 40-50 (1990); Wolf et al, Nucleic 
- Acids Research. 15: 291 1-2926 (1987); or Ghosh et al. Nucleic Acids Research, 15: 
20 5353-5372 ( 1 987). Preferably, tag complements are synthesized on and used with the 
same solid phase support, which may comprise a variety of forms and include a 
variety of linking moieties. Such supports may comprise microparticles or arrays, or 
matrices, of regions where uniform populations of tag complements are synthesized. 
A wide variety of microparticle supports may be used with the invention, including 
15 microparticles made of controlled pore glass (CPG), highly cross-linked polystyrene, 
acrylic copolymers, cellulose, nylon, dextran, latex, polyacrolein, and the like, 
disclosed in the following exemplary references: Meth. Enzymol., Section A, pages 
1 1-147, vol. 44 (Academic Press, New York, 1976); U.S. patents 4,678.814; 
4,413,070; and 4,046;720; and Pon, Chapter 19. in Agrawal, editor. Methods in 
0 Molecular Biology, Vol. 20, (Humana Press, Totowa, NJ, 1 993). Microparticle 
supports further include commercially available nucleoside-derivatized CPG and 
polystyrene beads (e.g. available from Applied Biosystems, Foster City. CA); 
derivatized magnetic beads; polystyrene grafted with polyethylene glycol (e.g., 
TentaGelTM^ Rapp Polymere, Tubingen Germany); and the like. Selection of the 
5 support characteristics, such as material, porosity, size, shape, and the like, and the 
type of linking moiety employed depends on the conditions under which the tags are 
used. For example, in applications involving successive processing with enzymes, 
supports and linkers that minimize steric hindrance of the enzymes and that facilitate 
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access to substrate are preferred. Other important factors to be considered in selecting 
the most appropriate microparticle support include size uniformity, efficiency as a 
synthesis support, degree to which surface area known, and optical properties, e.g. as 
explain more fully below, clear smooth beads provide instrumentational advantages 
5 when handling large numbers of beads on a surface. 

Exemplary linking moieties for attaching and/or synthesizing tags on 
microparticle surfaces are disclosed in Pon et al, Biotechniques, 6:768-775 (1988); 
Webb, U.S. patent 4,659,774; Barany et al, hitemational patent application 
PCT/US9 1/06 103; Brown et al, J. Chem. Soc. Commun., 1989: 891-893; Damha et 

10 al, Nucleic Acids Research, 18: 3813-3821 (1990); Beattie et al. Clinical Chemistr>', 
39: 719-722 (1993); Maskos and Southern, Nucleic Acids Research, 20: 1679-1684 
(1992); and the like. 

As mentioned above, tag complements may also be synthesized on a single 
(or a few) solid phase support to form an array of regions uniformly coated with tag 

1 5 complements. That is, within each region in such an array the same tag complement 
is synthesized. Techniques for synthesizing such arrays are disclosed in McGall et al, 
International application PCT/US93/03767; Pease et al, Proc. Natl. Acad. Sci., 91 : 
5022-5026 (1994); Southern and Maskos, International application 
.PCT/GB89/01 1 14; Maskos and Southern (cited above); Southern et al. Genomics, 13: 

20 1 008- 1017(1 992); and Maskos and Southern, Nucleic Acids Research, 2 1 : 4663- 
4669(1993). 

Preferably, the invention is implemented with microparticles or beads 
uniformly coated with complements of the same tag sequence. Microparticle supports 
and methods of covalently or noncovalently linking oligonucleotides to their surfaces 

25 are well known, as exemplified by the following references: Beaucage and Iyer (cited 
above); Gait, editor. Oligonucleotide Synthesis: A Practical Approach (IRL Press, 
Oxford, 1984); and the references cited above. Generally, the size and shape of a 
microparticle is not critical; however, microparticles in the size range of a few, e.g. 1- 
2, to several hundred, e.g. 200-1000 |im diameter are preferable, as they facilitate the 

30 construction and manipulation of large repertoires of oligonucleotide tags with 
minimal reagent and sample usage. 

In some preferred applications, conmiercially available control led-pore glass 
(CPG) or polystyrene supports are employed as solid phase supports in the invention. 
Such supports come available with base-labile linkers and initial nucleosides attached. 

35 e.g. Applied Biosystems (Foster City, CA). Preferably, microparticles having pore 
size between 500 and 1000 angstroms are employed. 

In other preferred applications, non-porous microparticles are employed for 
their optical properties, which may be advantageously used when tracking large 
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numbers of microparticles on planar supports, such as a microscope slide. 
Particularly preferrednon-porous microparticles are the glycidal methacrylate (GMA) 
beads available from Bangs Laboratories (CarmeU IN). Such microparticles are 
useful in a variety of sizes and derivatized with a variety of linkage groups for 
5 synthesizing tags or tag complements. Preferably, for massively parallel 

manipulations of tagged microparticles, 5 urn diameter GMA beads are employed. 



Attaching Taps to Polynucleotides 
For Sorting onto Solid Phase Supports 
An important aspect of the invention is the sorting and attachment of a 
populations of polynucleotides, e.g. from a cDNA library, to microparticles or to 

1 5 separate regions on a solid phase support such that each microparticle or region has 
substantially only one kind of polynucleotide attached. This objective is 
accomplished by insuring that substantially all different polynucleotides have 
different lags attached. This condition, in turn, is brought about by taking a sample of 
. the ftiU ensemble of tag-polynucleotide conjugates for analysis. (It is acceptable thai 

20 identical polynucleotides have different tags, as it merely results in the same 

polynucleotide being operated on or analyzed twice in two different locations.) Such 
sampling can be carried out either overtly-for example, by taking a small volume 
from a larger mixture-after the tags have been attached to the polynucleotides, it can 
be carried out inherently as a secondary effect of the techniques used to process ihe 

25 polynucleotides and tags, or sampling can be carried out both overtly and as an 
inherent part of processing steps. 

Preferably, in constructing a cDNA library where substantially all different 
cDNAs have different tags, a tag repertoire is employed whose complexity, or number 
of distinct tags, greatly exceeds the total number of mRNAs extracted from a cell or 

30 tissue sample. Preferably, the complexity of the tag repertoire is at least 10 times that 
of the polynucleotide population; and more preferably, the complexity of the tag 
repertoire is at least 100 times that of the polynucleotide population. Below, a 
protocol is disclosed for cDNA library construction using a primer mixture that 
contains a full repertoire of exemplary 9- word tags. Such a mixture of tag-containing 

3 5 primers has a complexity of 8^ or about 1 .34 x 1 0^ As indicated by Winslow et al. 
Nucleic Acids Research, 19: 3251-3253 (1991), mRNA for library construction can 
be extracted from as few as 10-100 mammalian cells. Since a single mammalian cell 
contains about 5x10^ copies of mRNA molecules of about 3.4 x 1 0^* different kinds, 
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by standard techniques one can isolate the mRNA from about 100 cells, or 
(theoretically) about 5x10^ mRNA molecules. Comparing this number to the 
complexity of the primer mixture shows that without any additional steps, and even 
assuming that mRNAs are converted into cDNAs with perfect efficiency (1% 

5 efficiency or less is more accurate), the cDNA library construction protocol results in 
a population containing no more than 37% of the total number of different tags. That 
is, without any overt sampling step at all, the protocol inherently generates a sample 
that comprises 37%, or less, of the tag repertoire. The probability of obtaining a 
double under these conditions is about 5%, which is within the preferred range. With 

0 mRNA from 10 cells, the fraction of the tag repertoire sampled is reduced to only 
3.7%, even assuming that all the processing steps take place at 100% efficiency. In 
fact, the efficiencies of the processing steps for constructing cDNA libraries are very 
low, a "rule of thumb" being that good library should contain about 10* cDNA clones 
from mRNA extracted from 10^ mammalian cells. 

5 Use of larger amounts of mRNA in the above protocol, or for larger amounts 

of polynucleotides in general, where the number of such molecules exceeds the 
complexity of the tag repertoire, a tag-polynucleotide conjugate mixture potentially 
contains every possible pairing of tags and types of mRNA or polynucleotide. In such 
- cases, overt sampling may be implemented by removing a sample volume after a 

0 serial dilution of the starting mixture of tag-polynucleotide conjugates. The amount 
of dilution required depends on the amount of starting material and the efficiencies of 
the processing steps, which are readily estimated. 

If mRNA were extracted from 10^ cells (which would correspond to about 0.5 
|ig of poly(A)" RNA), and if primers were present in about 10-1 00 fold concentration 

5 excess-as is called for in a typical protocol, e.g. Sambrook et al. Molecular Cloning, 
Second Edition, page 8.61 [10 jiL 1.8 kb mRNA at I mg/mL equals about 1.68 x 10*" 
moles and 1 0 |iL 1 8-mer primer at 1 mg/mL equals about 1 .68 x 1 0'^ moles], then the 
total number of tag-polynucleotide conjugates in a cDNA library would simply be 
equal to or less than the starting number of mRNAs, or about 5 x lO" vectors 

0 containing tag-polynucleotide conjugates-again this assumes that each step in cDNA 
construction-first strand synthesis, second strand synthesis, ligation into a vector- 
occurs with perfect efficiency, which is a very conservative estimate. The actual 
number is significantly less. 

If a sample of n tag-polynucleotide conjugates are randomly drawn from a 

5 reaction mixture-as could be effected by taking a sample volume, the probability of 
drawing conjugates having the same tag is described by the Poisson distribution, 
P(r)=e'\x)7r, where r is the number of conjugates having the same tag and X=np, 
where p is the probability of a given tag being selected. If n=10^ and p=l/(l .34 x 
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10^), then ?i=.00746 and P(2)=2,76 x 10*^ Thus, a sample of one million molecules 
gives rise to an expected number of doubles well within the preferred range. Such a 
sample is readily obtained as follows: Assume that the 5 x 10* ' mRNAs are perfectly 
converted into 5 x lO'* vectors with tag-cDNA conjugates as inserts and that the 5 x 
5 1 0' * vectors are in a reaction solution having a volume of 1 00 |il. Four 1 0-fold serial 
dilutions may be carried out by transferring 1 0 jil from the original solution into a 
vessel containing 90 \i\ of an appropriate buffer, such as TE, This process may be 
repeated for three additional dilutions to obtain a 100 |al solution containing 5 x 10^ 
vector molecules per \x\. A2\xl aliquot from this solution yields 1 0^ vectors 

10 containing tag-cDNA conjugates as inserts. This sample is then amplified by straight 
forward transformation of a competent host cell followed by culturing. 

Of course, as mentioned above, no step in the above process proceeds with 
perfect efficiency. In particular, when vectors are employed to amplify a sample of 
tag-polynucleotide conjugates, the step of transforming a host is very inefficient. 

1 5 Usually, no more than 1 % of the vectors are taken up by the host and replicated. 

Thus, for such a method of amplification, even fewer dilutions would be required to 
obtain a sample of 1 0^ conjugates. 

A repertoire of oligonucleotide tags can be conjugated to a population of 
* polynucleotides in a number of ways, including direct enzymatic ligation, 

20 amplification, e.g. via PGR, using primers containing the tag sequences, and the like. 
The initial ligating step produces a very large population of tag-polynucleotide 
conjugates such that a single tag is generally attached to many different 
polynucleotides. However, as noted above, by taking a sufficiently small sample of 
the conjugates, the probability of obtaining "doubles," i.e. the same tag on two 

25 different polynucleotides, can be made negligible. Generally, the larger the sample 
the greater the probability of obtaining a double. Thus, a design trade-off exists 
between selecting a large sample of tag-polynucleotide conjugates— which, for 
example, ensures adequate coverage of a target polynucleotide in a shotgun 
sequencing operation or adequate representation of a rapidly changing mRNA pool, 

30 and selecting a small sample which ensures that a minimal number of doubles will be 
present. In most embodiments, the presence of doubles merely adds an additional 
source of noise or, in the case of sequencing, a minor complication in scanning and 
signal processing, as microparticles giving multiple fluorescent signals can simply be 
ignored. 

35 As used herein, the term "substantially all" in reference to attaching tags to 

molecules, especially polynucleotides, is meant to reflect the statistical nature of the 
sampling procedure employed to obtain a population of tag-molecule conjugates 
essentially free of doubles. The meaning of substantially aill in terms of actual 
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percentages of tag-molecule conjugates depends on how the tags are being employed. 
Preferably, for nucleic acid sequencing, substantially all means that at least eighty 
percent of the polynucleotides have unique tags attached. More preferably, it means 
that at least ninety percent of the polynucleotides have unique tags attached. Still 
5 more preferably, it means that at least ninety-five percent of the polynucleotides have 
unique tags attached. And, most preferably, it means that at least ninety-nine percent 
of the polynucleotides have unique tags anached. 

Preferably, when the population of polynucleotides consists of messenger 
RNA (mRNA), oligonucleotides tags may be attached by reverse transcribing the 
1 0 mRNA with a set of primers preferably containing complements of tag sequences. 
An exemplary set of such primers could have the following sequence (SEQ ID NO: 
1)- 

5 ' -mRNA- [A]n -3* 
15 [T] i9GG[W,W, W, C] qAC CAGCTG ATC-5 ' -biotin 



where "[W, W, W,C]9" represents the sequence of an oligonucleotide tag of nine 
. subunits of four nucleotides each and "[W,W,W,C]" represents the subunit sequences 
20 listed above, i.e. " W" represents T or A. The underlined sequences identify an 

optional restriction endonuclease site that can be used to release the polynucleotide 
from attachment to a solid phase support via the biotin, if one is employed. For the 
above primer, the complement attached to a microparticle could have the form: 

25 5'-[G,W,W,W] gTGG-linker-microparticle 

After reverse transcription, the mRNA is removed, e.g. by RNase H digestion, 
and the second strand of the cDNA is synthesized using, for example, a primer of the 
following form (SEQ ID NO: 2): 



30 



5'-NRRGATCyNNN-3 



where N is any one of A, T, G, or C; R is a purine-containing nucleotide, and Y is a 
pyrimidine-containing nucleotide. This particular primer creates a Bst Yl restriction 
35 site in the resulting double stranded DNA which, together with the Sal I site, 

facilitates cloning into a vector with, for example. Bam HI and Xho I sites. After Bst 
Yl and Sal I digestion, the exemplary conjugate would have the form: 
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5'-RCGACCA[C,W,W,W]9GG[T]i9- cDNA -NNNR 

GGT[G,W,W,W]9CC[A]i9- rDNA -NNNYCTAG-5' 

The polynucleotide-tag conjugates may then be manipulated using standard molecular 
5 biology techniques. For example, the above conjugate-which is actually a mixture- 
may be inserted into commercially available cloning vectors, e.g. Stratagene Cloning 
System (La Jolla, CA); transfected into a host, such as a commercially available host 
^ bacteria; which is then cultured to increase the number of conjugates. The cloning 
vectors may then be isolated using standard techniques, e.g. Sambrook et al, 
1 0 Molecular Cloning, Second Edition (Cold Spring Harbor Laboratory, New York, 
1989). Alternatively, appropriate adaptors and primers may be employed so that the 
conjugate population can be increased by PCR. 

Preferably, when the iigase-based method of sequencing is employed, the Bst 
Yl and Sal I digested fragments are cloned into a Bam HI-/Xho I-digested vector 
1 5 having the following single-copy restriction sites (SEQ ID NO: 3): 

5 ' -GA GGATG CCTTTAT GGATCC A CTCGAG ATCCCAATCCA- 3 * 
Fokl BamHI Xhol 

20 

This adds the Fok I site which will allow initiation of the sequencing process 
discussed more fully below. 

Tags can be conjugated to cDNAs of existing libraries by standard cloning 
methods. cDNAs are excised from their existing vector, isolated, and then ligated into 
25 a vector containing a repertoire of tags. Preferably, the tag-containing vector is 

linearized by cleaving with two restriction enzymes so that the excised cDNAs can be 
ligated in a predetermined orientation. The concentration of the linearized tag- 
containing vector is in substantial excess over that of the cDNA inserts so that 
ligation provides an inherent sampling of tags. 
30 A general method for exposing the single stranded tag after amplification 

involves digesting a target polynucleotide-containing conjugate with the 5'-»3' 
exonuclease activity of T4 DNA polymerase, or a like enzyme. When used in the 
presence of a single deoxynucleoside triphosphate, such a polymerase will cleave 
nucleotides from 3' recessed ends present on the non-template strand of a double 
35 stranded fragment until a complement of the single deoxynucleoside triphosphate is 
reached on the template strand. When such a nucleotide is reached the 5*^3' 
digestion effectively ceases, as the polymerase's extension activity adds nucleotides at 
a higher rate than the excision activity removes nucleotides. Consequently, single 
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stranded tags constructed with three nucleotides are readily prepared for loading onto 
solid phase supports. . 

The technique may also be used to preferentially methylate interior Fok I sites 
of a target polynucleotide while leaving a single Fok I site at the terminus of the 
5 polynucleotide unmethylated. First, the terminal Fok I site is rendered single stranded 
, using a polymerase with deoxycytidine triphosphate. The double stranded portion of 
the fragment is then methylated, after which the single stranded terminus is filled in 
with a DNA polymerase in the presence of all four nucleoside triphosphates, thereby 
regenerating the Fok I site. Clearly, this procedure can be generalized to 

1 0 endonucleases other than Fok 1. 

After the oligonucleotide tags are prepared for specific hybridization, e.g. by 
rendering them single stranded as described above, the polynucleotides are mixed 
with microparticles containing the complementary sequences of the tags under 
conditions that favor the formation of perfectly matched duplexes between the tags 

15 and their complements. There is extensive guidance in the literature for creating these 
conditions. Exemplary references providing such guidance include Wetmur, Critical 
Reviews in Biochemistry and Molecular Biology, 26: 227-259 (1991); Sambrook et 
al, Molecular Cloning: A Laboratory Manual, 2nd Edition (Cold Spring Harbor 
. Laboratory, New York, 1989); and the like. Preferably, the hybridization conditions 

20 are sufficiently stringent so that only perfectly matched sequences form stable 

duplexes. Under such conditions the polynucleotides specifically hybridized through 
their tags may be ligated to the complementary sequences attached to the 
microparticles. Finally, the microparticles are washed to remove polynucleotides with 
unligated and/or mismatched tags. 

25 When CPG microparticles conventionally employed as synthesis supports are 

used, the density of tag complements on the micropanicle surface is typically greater 
than that necessary for some sequencing operations. That is, in sequencing 
approaches that require successive treatment of the attached polynucleotides with a 
variet>' of enzymes, densely spaced polynucleotides may tend to inhibit access of the 

30 relatively bulky enzymes to the polynucleotides. In such cases, the polynucleotides 
are preferably mixed with the microparticles so that tag complements are present in 
significant excess, e.g. from 10:1 to 100:1, or greater over the polynucleotides. This 
ensures that the density of polynucleotides on the microparticle surface will not be so 
high as to inhibit enzyme access. Preferably, the average inter-polynucleotide spacing 

35 on the microparticle surface is on the order of 30-100 nm. Guidance in selecting 

ratios for standard CPG supports and Ballotini beads (a type of solid glass support) is 
found in Maskos and Southern, Nucleic Acids Research, 20: 1679-1684 (1992). 
Preferably, for sequencing applications, standard CPG beads of diameter in the range 
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of 20-50 urn are loaded with about 10^ polynucleotides, and GMA beads of diameter 
in the range of 5-10 are loaded with a few tens of thousand of polynucleotides, 
e.g,4x 104 to6x lO* 

In the preferred embodiment, tag complements are synthesized on 
5 microparticles combinatorially; thus, at the end of the synthesis, one obtains a 

complex mixture of microparticles from which a sample is taken for loading tagged 
polynucleotides. The size of the sample of microparticles will depend on several 
factors, including the size of the repertoire of tag complements, the nature of the 
apparatus for used for observing loaded microparticles-e.g. its capacity, the tolerance 
10 for multiple copies of microparticles with the same tag complement (i.e. "bead 
doubles"), and the like. The following table provide guidance regarding 
microparticle sample size, microparticle diameter, and the approximate physical 
dimensions of a packed array of microparticles of various diameters. 

15 . 

Microparticle diameter 5 ^m 10 nm 20 ^ini 40 ^im 

Max. no. 

polynucleotides loaded 

atlperlO^sq. 3x10^ 1,26 xlO^ 5 x 10^ 

angstrom 

Approx. area of 
monolayer of 10^ 

microparticles ,45 x .45 cm I x I cm 2 x 2 cm 4 x 4 cm 

20 The probability that the sample of microparticles contains a given tag complement or 
is present in multiple copies is described by the Poisson distribution, as indicated in 
the following table. 

25 

Table VII 
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Number of 
microparticles in 
sample (as fraction 
of repertoire size), 
m 


Fraction of 
repertoire of tag 
complements 
present in 
sample, 
l-e"" 


Fraction of 
microparticles in 
sample with unique 
tag complement 
attached, 
m(e''")/2 


Fraction of 
microparticles in 

same tag 

comnl&ment a<; nne 
other fnicronarticle 

in samnle 
f'bead douhlei'M 


1. 000 


0.63 




U. 1 o 


.693 


0.50 


0.35 


0.12 


.405 


0.33 


0.27 


0.05 


.285 


0.25 . 


0.21 


0.03 


.223 


0.20 


0-18 


0.02 


.105 


0.10 


0.09 


0.005 


.010 


0.0 1 


0.01 





High Specificity Sorting and Panning 

The kinetics of sorting depends on the rate of hybridization of oligonucleotide 
tags to their tag complements which, in turn, depends on the complexity of the tags in 
the hybridization reaction. Thus, a trade off exists between sorting rate and tag 
complexity, such that an increase in sorting rate may be achieved at the cost of 
reducing the complexity of the tags involved in the hybridization reaction. As 
explained below, the effects of this trade off may be ameliorated by "panning." 

Specificity of the hybridizations may be increased by taking a sufficiently 
small sample so that both a high percentage of tags in the sample are imique and the 
nearest neighbors of substantially all the tags in a sample differ by at least two words. 
This latter condition may be met by taking a sample that contains a number of tag- 
polynucleotide conjugates that is about 0.1 percent or less of the size of the repertoire 
being employed. For example, if tags are constructed with eight words selected from 
Table II, a repertoire of 8^, or about 1 .67 x 10'^, tags and tag complements are 
produced. In a library of tag-cDNA conjugates as described above, a 0.1 percent 
sample means that about 16,700 different tags are present. If this were loaded directly 
onto a repertoire-equivalent of microparticles, or in this example a sample of 1 .67 x 
10*7 microparticles, then only a sparse subset of the sampled microparticles would be 
loaded. The density of loaded microparticles can be increase-for example, for more 
efficient sequencing-by undertaking a "panning" step in which the sampled tag- 
cDNA conjugates are used to separate loaded microparticles from unloaded 
microparticles. Thus, in the example above, even though a "0.1 percent" sample 
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contains only 1 6,700 cDNAs, the sampling and panning steps may be repeated until 
as many loaded microparticles as desired are accumulated. 

A panning step may be implemented by providing a sample of tag-cDNA 
conjugates each of which contains a capture moiety at an end opposite, or distal to, 
5 the oligonucleotide tag. Preferably, the capture moiety is of a type which can be 
released from the tag-cDNA conjugates, so that the tag-cDNA conjugates can be 
sequenced with a single-base sequencing method. Such moieties may comprise 
biotin, digoxigenin, or like ligands, a triplex binding region, or the like. Preferably, 
such a capture moiety comprises a biotin component. Biotin may be attached to tag- 

1 0 cDNA conjugates by a number of standard techniques. If appropriate adapters 

containing PGR primer binding sites are attached to tag-cDNA conjugates* biotin may 
be attached by using a biotinylated primer in an amplification after sampling. 
Ahematively, if the tag-cDNA conjugates are inserts of cloning vectors, biotin may be 
attached after excising the tag-cDNA conjugates by digestion with an appropriate 

1 5 restriction enzyme followed by isolation and filling in a protruding strand distal to the 
tags with a DNA polymerase in the presence of biotinylated uridine triphosphate. 

After a tag-cDNA conjugate is captured, it may be released from the biotin 
moiety in a number of ways, such as by a chemical linkage that is cleaved by 
reduction, e.g. Herman et al. Anal. Biochem., 156: 48-55 (1986), or that is cleaved 

20 photochemically, e.g. Olejnik etal, Nucleic Acids Research, 24: 361-366 (1996), or 
that is cleaved enzymatically by introducing a restriction site in the PGR primer. The 
latter embodiment can be exemplified by considering the library of tag-poiynucleolide 
conjugates described above: 



25 5*-RCGACCA[C,W,W,W]9GG[T]i9- cDNA -NNNR 

GGT[G,W,W,W]9CC[A]i9- rDNA -NNNYCTAG-5' 



30 



The following adapters may be ligated to the ends of these fragments to permit 
amplification by PGR: 



5 ' - xxxxxxxxxxxxxxxxxxxx 

XXXXXXXXXXXXXXXXXXXXYGAT 



35 Right Adapter 



GATCZZACTAGTZZZZZZZZZZZZ-3 ' 
40 ZZTGATCAZZZZZZZZZZZZ 



31- 



wo 97/13877 



PCT/US96/16342 



Left Adapter 
ZZTGATCAZZZZZZZZZZZZ-5 ' -biotin 

5 

Left Primer 

where "ACTAGT" is a Spe I recognition site (which leaves a staggered cleavage 
ready for single base sequencing), and the X's and Z's are nucleotides selected so that 

1 0 the annealing and dissociation temperatures of the respective primers are 

approximately the same. After ligation of the adapters and amplification by PGR 
using the biotinylated primer, the tags of the conjugates are rendered single stranded 
by the exonuclease activity of T4 DNA polymerase and conjugates are combined with 
a sample of microparticles, e.g. a repertoire equivalent, with tag complements 

1 5 attached. After annealing under stringent conditions (to minimize mis-attachment of 
tags), the conjugates are preferably ligated to their tag complements and the loaded 
microparticles are separated from the unloaded microparticles by capture with 
avidinated magnetic beads, or like capture technique. 

Returning to the example, this process results in the accumulation of about 

20 1 0,500 (=1 6,700 x .63) loaded microparticles with different tags, which may be 

released from the magnetic beads by cleavage with Spe I. By repeating this process 
40-50 times with new samples of microparticles and tag-cDNA conjugates, 4-5 x 10^ 
cDNAs can be accumulated by pooling the released microparticles. The pooled 
microparticles may then be simultaneously sequenced by a single-base sequencing 

25 technique. 

Determining how many times to repeat the sampling and panning steps-or 
more generally, determining how many cDNAs to analyze, depends on one's 
objective. If the objective is to monitor the changes in abundance of relatively 
common sequences, e.g. making up 5% or more of a population, then relatively small 

30 samples, i.e. a small fraction of the total population size, may allow statistically 
significant estimates of relative abundances. On the other hand, if one seeks to 
monitor the abundances of rare sequences, e.g. making up 0.1% or less of a 
population, then large samples are required. Generally, there is a direct relationship 
between sample size and the reliability of the estimates of relative abundances based 

35 on the sample. There is extensive guidance in the literature on determining 

appropriate sample sizes for making reliable statistical estimates, e.g. KoUer et al. 
Nucleic Acids Research, 23:185-191 (1994); Good, Biometrika, 40: 16-264 (1953); 
Bunge et al, J. Am. Stat. Assoc., 88: 364-373 (1993); and the like. Preferably, for 
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monitoring changes in gene expression based on the analysis of a series of cDNA 
libraries containing 10^ to 10^ independent clones of 3.0-3.5 x 10* different 
sequences, a sample of at least ICH sequences are accumulated for analysis of each 
library. More preferably, a sample of at least 10^ sequences are accumulated for the 
5 analysis of each library; and most preferably, a sample of at least 5 x 1 0^ sequences 
are accumulated for the analysis of each library. Ahematively, the number of 
sequences sampled is preferably sufficient to estimate the relative abundance of a 
sequence present at a frequency within the range of 0.1% to 5% with a 95% 
confidence limit no larger than 0.1% of the population size. 

Single Base DNA Sequencing 
The present invention can be employed with conventional methods of DNA 
sequencing, e.g. as disclosed by Hultman et al, Nucleic Acids Research, 17: 4937- 
4946 (1989). However, for parallel, or simultaneous, sequencing of multiple 

1 5 polynucleotides, a DNA sequencing methodology is preferred that requires neither 
electrophoretic separation of closely sized DNA fragments nor analysis of cleaved 
nucleotides by a separate analytical procedure, as in peptide sequencing. Preferably, 
the methodology permits the stepwise identification of nucleotides, usually one at a 
- time, in a sequence through successive cycles of treatment and detection. Such 

20 methodologies are referred to herein as "single base" sequencing methods. Single 
base approaches are disclosed in the following references: Cheeseman, U.S. patent 
5,302,509; Tsien et al. International application WO 91/06678; Rosenthal et al, 
International application WO 93/21340; Canard et al. Gene, 148: 1-6 (1994); and 
Metzker et al. Nucleic Acids Research, 22: 4259-4267 ( 1 994). 

25 A "single base" method of DNA sequencing which is suitable for use with the 

present invention and which requires no electrophoretic separation of DNA fragments 
is described in International application PCT/US95/03678. Briefly, the method 
comprises the following steps: (a) ligating a probe to an end of the polynucleotide 
having a protruding strand to form a ligated complex, the probe having a 

30 complementary protruding strand to that of the polynucleotide and the probe having a 
nuclease recognition site; (b) removing unligated probe from the ligated complex; (c) 
identifying one or more nucleotides in the protruding strand of the polynucleotide by 
the identity of the ligated probe; (d) cleaving the ligated complex with a nuclease; and 
(e) repeating steps (a) through (d) until the nucleotide sequence of the polynucleotide, 

35 or a portion thereof, is determined. 

A single signal generating moiety, such as a single fluorescent dye, may be 
employed when sequencing several different target polynucleotides attached to 
different spatially addressable solid phase supports, such as fixed microparticles, in a 
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parallel sequencing operation. This may be accomplished by providing four sets of 
probes that are applied sequentially to the plurality of target polynucleotides on the 
different microparticles. An exemplary set of such probes are shown below: 





Set 1 
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Set 4 




ANNNN. 


. .NN 




dANNNN 


. . .NN 




dANNNN 


. . .NN 




dANNNN 


. . .NN 




N. 


. .NNTT. . 


.T* 


d N 


. . .NNTT. 


. .T 


N 


. . . NNTT . 


. .T 


N 


. . .NNTT. 


. .T 


dCNNNN 


. . .NN 




CNNNN . 


. .NN 




dCNNNN 


. . .NN 




dCNNNN , 


. . .NN 




N, 


. . . NNTT . 


. .T 


N. 


. . NNTT . . 


.T* 


N, 


. . .NNTT. 


. .T 


N, 


. . .NNTT. 


. .T 


dGNNNN , 


. . .NN 




dGNNNN 


. . .NN 




GNNNN . , 


. .NN 




dGNNNN, 


. . .NN 




N, 


. . .NNTT. 


. .T 


N 


. . .NNTT. 


. .T 


N . , 


. . NNTT . . 


.T* 


N. 


. . . NNTT . 


. .T 


dTNNNN . 


, . NN 




dTNNNN 


. . .NN 




dTNNNN. 


, . - NN 




TNNNN. . 


.NN 




N. 


. . .NNTT. 


. .T 


N 


. . . NNTT . 


. .T 


N. 


, . . NNTT . 


. .T 


N. . 


.NNTT. . 


.T* 



where each of the listed probes represents a mixture of 4^=64 oligonucleotides such 
that the identity of the 3' terminal nucleotide of the top strand is fixed and the other 
positions in the protruding strand are filled by every 3-mer permutation of nucleotides, 
or complexity reducing analogs. The listed probes are also shown with a single 
stranded poly-T tail with a signal generating moiety attached to the terminal thymidine, 
shown as 'T*". The "d" on the unlabeled probes designates a ligation-blocking moiety 
or absense of 3'-hydroxyl, which prevents unlabeled probes from being ligated. 
Preferably, such 3'-terminal nucleotides are dideoxynucleotides. In this embodiment, 
the probes of set lare first applied to the plurality of target polynucleotides and treated 
with a ligase so that target polynucleotides having a thymidine complementary to the 3' 
terminal adenosine of the labeled probes are ligated. The unlabeled probes are 
simultaneously applied to minimize inappropriate ligations. The locations of the target 
polynucleotides that form ligated complexes with probes terminating in "A" are 
identified by the signal generated by the label carried on the probe. After washing and 
cleavage, the probes of set 2 are applied. In this case, target polynucleotides forming 
ligated complexes with probes terminating in "C" are identified by location. Similarly, 
the probes of sets 3 and 4 are applied and locations of positive signals identified. This 
process of sequentially applying the four sets of probes continues until the desired 
number of nucleotides are identified on the target polynucleotides. Clearly, one of 
ordinary skill could construct similar sets of probes that could have many variations, 
such as having protruding strands of different lengths, different moieties to block 
ligation of unlabeled probes, different means for labeling probes, and the like. 
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Apparatus for Sequencing Populations of Polynucleotides 
An objective oflhe invention is to sort identical molecules, particularly 
polynucleotides, onto the surfaces of microparticles by the specific hybridization of 
tags and their complements. Once such sorting has taken place, the presence of the 
5 molecules or operations performed on them can be detected in a number of ways 
depending on the nature of the tagged molecule, whether microparticles are detected 
separately or in "batches,'* whether repeated measurements are desired, and the like. 
Typically, the sorted molecules are exposed to ligands for binding, e.g. in drug 
development, or are subjected chemical or enzymatic processes, e.g. in polynucleotide 

1 0 sequencing. In both of these uses it is often desirable to simultaneously observe 

signals corresponding to such events or processes on large numbers of microparticles. 
Microparticles carrying sorted molecules (referred to herein as "loaded" 
microparticles) lend themselves to such large scale parallel operations, e.g. as 
demonstrated by Lam et al (cited above). 

1 5 Preferably, whenever light-generating signals, e.g. chemiluminescent, 

fluorescent, or the like, are employed to detect events or processes, loaded 
microparticles are spread on a planar substrate, e.g. a glass slide, for examination with 
a scanning system, such as described in International patent applications 
. PCT/US9 1/092 17, PCT/NL90/00081, and PCT/US95/01886. The scanning system 

20 should be able to reproducibly scan the substrate and to define the positions of each 
microparticle in a predetermined region by way of a coordinate system. In 
polynucleotide sequencing applications, it is important that the positional 
identification of microparticles be repeatable in successive scan steps. 

Such scanning systems may be constructed from commercially available 

25 components, e.g. x-y translation table controlled by a digital computer used with a 
detection system comprising one or more photomultiplier tubes, or alternatively, a 
CCD array, and appropriate optics, e.g. for exciting, collecting, and sorting 
fluorescent signals. In some embodiments a confocal optical system may be 
desirable. An exemplary scanning system suitable for use in four-color sequencing is 

30 illustrated diagrammatically in Figure 5. Substrate 300, e.g. a microscope slide with 
fixed microparticles, is placed on x-y translation table 302, which is connected to and 
controlled by an appropriately programmed digital computer 304 which may be any of 
a variety of commercially available personal computers, e.g. 486-based machines or 
PowerPC model 7100 or 8100 available form Apple Computer (Cupertino, CA). 

35 Computer software for table translation and data collection ftmctions can be provided 
by commercially available laboratory software, such as Lab Windows, available from 
National Instruments. 
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Substrate 300 and table 302 are operationally associated with microscope 306 
having one or more objective lenses 308 which are capable of collecting and 
delivering light to microparticles fixed to substrate 300. Excitation beam 3 1 0 from 
light source 312, which is preferably a laser, is directed to beam splitter 314, e.g. a 
dichroic mirror, which re-directs the beam through microscope 306 and objective lens 
308 which, in turn, focuses the beam onto substrate 300. Lens 308 collects 
fluorescence 316 emitted from the microparticles and directs it through beam splitter 
3 14 to signal distribution optics 3 1 8 which, in turn, directs fluorescence to one or 
more suitable opto-electronic devices for converting some fluorescence characteristic, 
e.g. intensity, lifetime, or the like, to an electrical signal. Signal distribution optics 
3 1 8 may comprise a variety of components standard in the art, such as bandpass 
filters, fiber optics, rotating mirrors, fixed position mirrors and lenses, diffraction 
gratings, and the like. As illustrated in Figure 2, signal distribution optics 3 1 8 directs 
fluorescence 3 16 to four separate photomultiplier tubes, 330, 332, 334, and 336, 
whose output is then directed to pre-amps and photon counters 350, 352, 354, and 
356. The output of the photon counters is collected by computer 304, where it can be 
stored, analyzed, and viewed on video 360. Alternatively, signal distribution optics 
3 1 8 could be a diffraction grating which directs fluorescent signal 3 1 8 onto a CCD 
- array. 

The stability and reproducibility of the positional localization in scanning will 
determine, to a large extent, the resolution for separating closely spaced 
microparticles. Preferably, the scanning systems should be capable of resolving' 
closely spaced microparticles, e.g. separated by a particle diameter or less. Thus, for 
most applications, e.g. using CPG microparticles, the scanning system should at least 
have the capability of resolving objects on the order of 10-100 jim. Even higher 
resolution may be desirable in some embodiments, but with increase resolution, the 
time required to fully scan a substrate will increase; thus, in some embodiments a 
compromise may have to be made between speed and resolution. Increases in 
scanning time can be achieved by a system which only scans positions where 
microparticles are known to be located, e.g from an initial fijll scan. Preferably, 
microparticle size and scanning system resolution are selected to permit resolution of 
fluorescently labeled microparticles randomly disposed on a plane at a density 
between about ten thousand to one hundred thousand microparticles per cm^. 

In sequencing applications, loaded microparticles can be fixed to the surface 
of a substrate in variety of ways. The fixation should be strong enough to allow the 
microparticles to undergo successive cycles of reagent exposure and washing without 
significant loss. When the substrate is glass, its surface may be derivatized with an 
alkylamino linker using commercially available reagents, e.g. Pierce Chemical, which 
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in txim may be cross-linked to avidin, again using conventional chemistries, to form 
an avidinated surface. Biotin moieties can be introduced to the loaded microparticles 
in a number of ways. For example, a fraction, e.g. 10-15 percent, of the cloning 
vectors used to attach tags to polynucleotides are engineered to contain a unique 
5 restriction site (providing sticky ends on digestion) immediately adjacent to the 
polynucleotide insert at an end of the polynucleotide opposite of the tag. The site is 
excised with the polynucleotide and tag for loading onto microparticles. After 
loading, about 10-15 percent of the loaded polynucleotides will possess the unique 
restriction site distal from the microparticle surface. After digestion with the 

1 0 associated restriction endonuclease, an appropriate double stranded adaptor 

containing a biotin moiety is ligated to the sticky end. The resulting microparticles 
are then spread on the avidinated glass surface where they become fixed via the 
biotin-avidin linkages. 

Alternatively and preferably when sequencing by ligation is employed, in the 

1 5 initial ligation step a mixture of probes is applied to the loaded microparticle: a 

fraction of the probes contain a type lis restriction recognition site, as required by the 
sequencing method, and a fi^ction of the probes have no such recognition site, but 
instead contain a biotin moiety at its non-ligating end. Preferably, the mixture 
. comprises about 10-15 percent of the biotinylated probe. 

20 In still another alternative, when DNA-loaded microparticles are applied to a 

glass substrate, the DNA may nonspecifically adsorb to the glass surface upon several 
hours, e.g. 24 hours, incubation to create a bond sufficiently strong to permit repeated 
exposures to reagents and washes without significant loss of microparticles. 
Preferably, such a glass substrate is a flow cell, which may comprise a channel etched 

25 in a glass slide. Preferably, such a channel is closed so that fluids may be pumped 
through it and has a depth sufficiently close to the diameter of the microparticles so 
that a monolayer of microparticles is trapped within a defined observation region. 

Identification of Novel Polynucleotides 
30. in cDNA Libraries 

Novel polynucleotides in a cDNA library can be identified by constructing a 
library of cDNA molecules attached to microparticles, as described above. A large 
fraction of the library, or even the entire library, can then be partially sequenced in 
parallel. After isolation of mRNA, and perhaps normalization of the population as 
35 taught by Scares et al, Proc. Natl. Acad. Sci., 91 : 9228-9232 (1994), or like 

references, the following primer may by hybridized to the polyA tails for first strand 
synthesis with a reverse transcriptase using conventional protocols (SEQ ID NO: 1 ): 



-37- 



wo 97/13877 



PCT/US96/16342 



5'-mRNA- [AJn -3* 

[T] 19" [primer site] -GG [W, W, W, C] gACCAGCTGATC-S ' 

where [W,W,W,C]9 represents a tag as described above, "ACCAGCTGATC" is an 
optional sequence forming a restriction site in double stranded form, and "primer site" 
is a sequence common to all members of the library that is later used as a primer 
binding site for amplifying polynucleotides of interest by PGR. 

After reverse transcription and second strand synthesis by conventional 
techniques, the double stranded fragments are inserted into a cloning vector as 
described above and amplified. The amplified library is then sampled and the sample 
amplified. The cloning vectors from the amplified sample are isolated, and the tagged 
cDNA fragments excised and purified. After rendering the tag single stranded with a 
polymerase as described above, the fi-agments are methylated and sorted onto 
microparticles in accordance with the invention. Preferably, as described above, the 
cloning vector is constructed so that the tagged cDNAs can be excised with an 
endonuclease, such as Fok I, that will allow immediate sequencing by the preferred 
single base method after sorting and ligation to microparticles. 

Stepwise sequencing is then carried out simultaneously on the whole library, 
or one or more large fractions of the library, in accordance with the invention until a 
'sufficient number of nucleotides are identified on each cDNA for unique 
representation in the genome of the organism from which the library is derived. For 
example, if the library is derived from mammalian mRNA then a randomly selected 
sequence 14-15 nucleotides long is expected to have unique representation among the 
2-3 thousand megabases of the typical mammalian genome. Of course identification 
of far fewer nucleotides would be sufficient for unique representation in a library 
derived from bacteria, or other lower organisms. Preferably, at least 20-30 
nucleotides are identified to ensure unique representation and to permit construction 
of a suitable primer as described below. The tabulated sequences may then be 
compared to known sequences to identify unique cDNAs. 

Unique cDNAs are then isolated by conventional techniques, e.g. constructing 
a probe from the PGR amplicon produced with primers directed to the prime site and 
the portion of the cDNA whose sequence was determined. The probe may then be 
used to identify the cDNA in a library using a conventional screening protocol. 

The above method for identifying new cDNAs may also be used to fingerprint 
mRNA populations, either in isolated measurements or in the context of a 
dynamically changing population. Partial sequence information is obtained 
simultaneously from a large sample, e.g. ten to a hundred thousand, or more, of 
cDNAs attached to separate microparticles as described in the above method. 
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Example 1 

Construction of a Tag Library 
An exemplary tag library is constructed as follows to form the chemically 
5 synthesized 9-word tags of nucleotides A, G, and T defined by the formula: 

3'-TGGC-[^(A,G,T)9]-CCCCp 

where "['*(A,G,T)9]" indicates a tag mixture where each lag consists of nine 4-mer 
1 0 words of A, G, and T; and "p" indicate a 5' phosphate. This mixture is ligated to the 
following right and left primer binding regions (SEQ ID NO: 4 and SEQ ID NO 5): 



15 



5*- AGTGGCTGGGCATCGGACCG 5'- GGGGCCCAGTCAGCGTCGAT 

TCACCGACCCGTAGCCp GGGTCAGTCGCAGCTA 



LEFT RIGHT 



The right and left primer binding regions are ligated to the above tag mixture, after 
which the single stranded portion of the ligated structure is filled with DNA 
20 'polymerase then mixed with the right and left primers indicated below and amplified 
to give a tag library (SEQ ID NO: 6). 



25 



Left Primer 

AGTGGCTGGGCATCGGACCG 



5 ' - AGTGGCTGGGCATCGGACCG- [ ^ {A, G, T) 9] -GGGGCCCAGTCAGCGTCGAT 
TCACCGACCCGTAGCCTGGC- (A, G, T) 9] -CCCCGGGTCAGTCGCAGCTA 

30 

CCCCGGGTCAGTCGCAGCTA- 5 ' 
Right Primer 

35 The underlined portion of the left primer binding region indicates a Rsr II recognition 
site. The left-most underlined region of the right primer binding region indicates 
recognition sites for Bsp 1201, Apa h and Eco O 1091, and a cleavage site for Hga I. 
The right-most underlined region of the right primer binding region indicates the 
recognition site for Hga I. Optionally, the right or left primers may be synthesized 

40 with a biotin attached (using conventional reagents, e.g. available from Clontech 
Laboratories, Palo Alto, CA) to facilitate purification after amplification and/or 
cleavage. 
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primer binding site Ppu MI site 




-CAAATTTG-CCTAGG-AGAAGGAGAAGGAGAAGG- 
t T 

Bam HI site 

Pme I site 



15 

The plasmid is cleaved with Ppu MI and Pme I (to give a Rsr Il-compatible end and a 
flush end so that the insert is oriented) and then methylated v^ith DAM melhylase. 
The tag-containing construct is cleaved with Rsr 11 and then ligated to the open 
plasmid, after which the conjugate is cleaved with Mbo I and Bam HI to permit 
20 ligation and closing of the plasmid. The plasmid is then amplified and isolated and 
' used in accordance with the invention. 



Example 3 

Changes in Gene Expression Profiles in Liver Tissue of Rats 

25 Exposed to Various Xenobiotic Agents 

In this experiment, to test the capability of the method of the invention to 
detect genes induced as a result of exposure to xenobiotic compounds, the gene 
expression profile of rat liver tissue is examined following administration of several 
compounds known to induce the expression of cytochrome P-450 isoenzymes. The 

30 results obtained from the method of the invention are compared to results obtained 
from reverse transcriptase PGR measurements and inmiunochemical measurements of 
the cytochrome P-450 isoenzymes. Protocols and materials for the latter assays are 
described in Morris et al. Biochemical Pharmacology, 52: 781-792 (1996). 

Male Sprague-Dawley rats between the ages of 6 and 8 weeks and weighing 

35 200-300 g are used, and food and water are available to the animals ad lib. Test 
compounds are phenobarbital (PB), metyrapone (MET), dexamethasone (DEX), 
clofibrate (CLO), com oil (CO), and P-naphthoflavone (BNF), and are available from 
Sigma Chemical Co. (St. Louis, MO). Antibodies against specific P-450 enzymes are 
available firom the following sources: rabbit anti-rat CYP3A1 fi-om Human Biologies, 

40 Inc. (Phoenix, AZ); goat anti-rat CYP4A1 fi-om Daiichi Pure Chemicals Co. (Tokyo, 
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Japan); monoclonal mouse anti-rat C YPl A 1, monoclonal mouse anti-rat CYP2C11, 
goal anti-rat CYP2E1, and monoclonal mouse anti-rat CYP2B1 from Oxford 
Biochemical Research, Inc. (Oxford, MI). Secondary antibodies (goat anti-rabbit IgG 
rabbit anti-goat IgG and goat anti-mouse IgG) are available from Jackson 
5 ImmunoResearch Laboratories (West Grove, PA). 

Animals are administered either PB (100 mg/kg), BNF (100 mg/kg), MET 
( 1 00 mg/kg), DEX ( 1 00 mg/kg), or CLO (250 mg/kg) for 4 consecutive days via 
intraperitoneal injection following a dosing regimen similar to that described by 
Wang et al. Arch. Biochem. Biophys. 290: 355-361 (1991). Animals treated with 
10 H2O and CO are used as controls. Two hours following the last injection (day 4), 
animals are killed, and the livers are removed. Livers are immediately frozen and 
stored at -70OC. 

Total RNA is prepared from frozen liver tissue using a modification of the 
method described by Xie et al, Biotechniques, 11; 326-327 (1991). Approximately 
1 5 100-200 mg of liver tissue is homogenized in the RNA extraction buffer described by 
Xie et al to isolate total RNA. The resulting RNA is reconstituted in 
diethylpyrocarbonate-treated water, quantified spectrophotometrically at 260 nm, and 
adjusted to a concentration of 100 ^g/ml. Total RNA is stored in 
- diethylpyrocarbonate-treated water for up to 1 year at -lO^C without any apparent 
20 degradation. RT-PCR and sequencing are performed on samples from these 
preparations. 

For sequencing, samples of RNA corresponding to about 0.5 ^g of poly(A)^ 
RNA are used to construct libraries of tag-cDNA conjugates following the protocol 
described in the section entitled "Attaching Tags to Polynucleotides for Sorting onto 

25 Solid Phase Supports," with the following exception: the tag repertoire is constructed 
from six 4-nucleotide words from Table 11. Thus, the complexity of the repertoire is 
8^ or about 2.6 x 10^. For each tag-cDNA conjugate library constructed, ten samples 
of about ten thousand clones are taken for amplification and sorting. Each of the 
amplified samples is separately applied to a fixed monolayer of about 10^ 10 jxm 

30 diameter GMA beads containing tag complements. That is, the "sample" of tag 

complements in the GMA bead population on each monolayer is about four fold the 
total size of the repertoire, thus ensuring there is a high probability that each of the 
sampled tag-cDNA conjugates will find its tag complement on the monolayer. After 
the oligonucleotide tags of the amplified samples are rendered single stranded as 

35 described above, the tag-cDNA conjugates of the samples are separately applied to the 
monolayers under conditions that penmit specific hybridization only between 
oligonucleotide tags and tag complements forming perfectly matched duplexes. 
Concentrations of the amplified samples and hybridization times are selected to 
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permit the loading of about 5 x 10"* to 2 x 10^ tag-cDNA conjugates on each bead 
where perfect matches occur. After ligation, 9-12 nucleotide portions of the attached 
cDNAs are determined in parallel by the single base sequencing technique described 
by Brenner in International patent application PCT/US95/03678. Frequency 
5 distributions for the gene expression profiles are assembled from the sequence 
information obtained from each of the ten samples. 

RT-PCRs of selected mRNAs corresponding to cytochrome P-450 genes and 
the constitutively expressed cyclophilin gene are carried out as described in Morris et 
al (cited above). Briefly, a 20 |iL reaction mixture is prepared containing Ix reverse 

1 0 transcriptase buffer (Gibco BRL), 1 0 nM dithiothreitol, 0.5 nM dNTPs, 2.5 oligo 
d(T)i5 primer, 40 units RNasin (Promega, Madison, WI), 200 units RNase H-reverse 
transcriptase (Gibco BRL), and 400 ng of total RNA (in diethylpyrocarbonate-treated 
water). The reaction is incubated for 1 hour at 37^C followed by inactivation of the 
enzyme at 95^C for 5 min. The resulting cDNA is stored at -20^C until used. For 

15 PCR amplification of cDNA, a 10 reaction mixture is prepared containing lOx 
polymerase reaction buffer, 2 mM MgCl2, 1 unit Taq DNA polymerase (Perkin- 
Elmer, Norwalk, CT), 20 ng cDNA, and 200 nM concentration of the 5' and 3' 
specific PCR primers of the sequences described in Morris et al (cited above). PCRs 
*are carried out in a Perkin-Elmer 9600 thermal cycler for 23 cycles using melting, 

20 annealing, and extension conditions of 94^0 for 30 sec, 56^C for 1 min., and 72^C 
for 1 min., respectively. Amplified cDNA products are separated by PAGE using 5% 
native gels. Bands are detected by staining with ethidium bromide. 

Western blots of the liver proteins are carried out using standard protocols 
after separation by SDS-PAGE. Briefly, proteins are separated on 10% SDS-PAGE 

25 gels under reducing conditions and immunoblotted for detection of P-450 isoenzymes 
using a modification of the methods described in Harris et al, Proc. Natl. Acad. Sci., 
88: 1407-1410 (1991). Protein are loaded at 50 ^g/lane and resolved under constant 
current (250 V) for approximately 4 hours at 2^C. Proteins are transferred to 
nitrocellulose membranes (Bio-Rad, Hercules, CA) in 15 mM Tris buffer containing 

30 120 mM glycine and 20% (v/v) methanol. The nitrocellulose membranes are blocked 
with 2.5% BSA and immunoblotted for P-450 isoenzymes using primary monoclonal 
and polyclonal antibodies and secondary alkaline phosphatase conjugated anti-IgG. 
Imhiunoblots are developed with the Bio-Rad alkaline phosphatase substrate kit. 
The three types of measurements of P-450 isoenzyme induction showed 

35 substantial agreement. 
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APPENDIX la 

Exemplary computer program for generating 
minimally cross hybridizing sets 
(single stranded tag/single stranded tag complement) 



Program minxh 
c 



c 
c 



c 
c 



integer* 2 subl (6) ,msGtl (1000, 6} ,mset2 (1000, 6) 
dimension nbase(6) 



write!*, *) 'ENTER SUBUNIT LENGTH* 
read(*, 100)nsub 
106 format (il) 

open (1, f ile=' sub4 .dat ' , f orm= ' formatted * , status= ' new* ) 



nset=0 

do 7000 ml=l,3 
do 7000 m2=l, 3 

do 7000 m3=l,3 

do 7000 m4=l, 3 
subl(l)=ml 
subl (2)-m2 
subl ( 3) =m3 
subl ( 4 ) =m4 



ndiff=3 



c 
c 

c Generate set of subunits differing from 

c subl by at least ndiff nucleotides: 

.c Save in msetl ' 

c 
c 

jj=l 

do 900 j=l,nsub 
900 msetl (l,j)=subl(j) 

c 
c 

GO 1000 kl=l, 3 
do 1000 k2-l,3 
do 1000 k3=l,3 

do 1000 k4-l, 3 

c 
c 



nbase(l)=kl 
nbase(2)=k2 
nbase(3)=k3 
nbase (4 ) =k4 
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1200 

c 

c 



n=0 

do 1200 j=l,nsub 
if (subl(j) .eq.l 
1 subl ( j ) .eq.2 

3 subfl j) .eq.3 

n=n+l 
endif 
continue 



if {n . ge . ndif f } then 



. and . 
. and. 
. and- 



nbase ( j ) 
nbase ( j ) 
nbase ( j ) 



ne 
ne 
ne 



, 1 
2 

.3) 



or . 
or . 
then 



c 
c 
c 
c 
c 
c 
c 



1100 

c 
c 

1000 
c 
'c 



1325 



jj=jj+l 
do 1100 i^l,nsub 

mset 1 ( j j , i ) =nbase ( i ] 
endif 



continue 



do 132 5 j2=l,nsub 
mset2 (1, j2} =msetl (1, j2) 
mset2 {2, j2) =mset 1 (2, j2) 



If number of mismatches 
is greater than or equal 
to ndiff then record 
subunit in matrix mset 



c 
c 
c 

c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 
c 

c 
c 

1700 



npass=0 



continue 

}ck=='npass+2 

npass=npass+l 



Compare subunit 2 from 
msetl with each successive 
subunit in msetl, i.e. 3, 
4,5, ... etc. Save those 
with mismatches . ge . ndiff 
in matrix mset2 starting at 
position 2. 

Next transfer contents 
of mset2 into msetl and 
start 

comparisons again this time 
starting with subunit 3. 
Continue until' all subunits 
undergo the comparisons. 
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do 1500 m=npass+2,jj 
n=0 

do 1600 j=l,nsub 

if (msetl (npass+1, j ) .eq. 1 . and.msetl (m, j ) . ne . 1 . or . 
2 msetl (npass+1, j ) . eq . 5 . and .mset 1 (m, j ) . ne . 2 . or . 

2 msetl (npass+1, j} . eq . 3 . and .mset 1 .ne.3) then 

n=n+l 
endif 

1600 continue 

if (n.ge.ndiff ) then 
kk=kk+l 

do 1625 i=l,nsub 
1625 mset2 (kk, i)=msetl (m, i) 

endif 

1500 continue 

^ kk is the number of subunits 

c stored in mset2 

c 

<= Transfer contents of mset2 

^ into msetl for next pass, 

c 

c • 

do 2000 k=l, kk 

do 2000 m=l,nsub 
2000 msetl (k,m)=mset2 {k,m) 

if(kk.lt.jj) then 
jj = kk 
goto 1700 
endif 



nset=nset+ 1 
write { 1, 7009} 

7009 format (/) 

do 7008 k=l, kk 
7008 write (1, 7010) (msetl ( k, m) , m=l , nsub) 

7010 format (4il) 
write ( * , * ) 

write(*,120) kk, nset 
120 format (Ix, 'Subunits in set= \ i5, 2x, • Set No=\i5) 

7000 continue 
close ( 1 ) 

c 

c 

end 

C ****** tr + ^^ * ^^ 
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APPENDIX lb 

Exemplary computer program for generating 
minimally cross hybridizing sets 
(single stranded tag/single stranded tag complement) 



Program tagN 

c 

c 

c Program tagN generates minimally cross-hybridizing 

c sets of subunits given i) N--subunit length, and ii) 

c an initial subunit sequence. tagN assumes that only 

c 3 of the four natural nucleotides are used in the tags. 

c 

c •* 

character*! subl (20) 

integer*2 mset ( 10000, 20) , nbase(20) 

c 
c 

write (*,*) *ENTER SUBUNIT LENGTH' 

read(*, 100)nsub 
100 format (12) 

c 
c 



write (*,*) 'ENTER SUBUNIT SEQUENCE' 
read(*, 110) (subl (k) , k=l,nsub) 
"no format (20al) 



c 
c 



c 
c 



ndiff=10 



Let a=l c=2 g=3 & t=4 



do 800 kk=l,nsub 

if (subl (kk) .eq- 'a* ) then 

mset (1, kk)-l 

endif 

if (subl (kk) .eq. 'c* ) then 
mset (1, kk)=2 
endif 

if (subl (kk) .eq. 'g* ) then 
mset (1, kk)=3 
endif 

if (subl (kk) .eq. ' t * ) then 
mset{l,kk)=4 
endif 

800 continue 

c 

c 

c Generate set of subunits differing from 

c subl by at least ndiff nucleotides. 

c 



do 1000 kl=l, 3 



-47- 



wo 97/13877 



PCTAJS96/16342 



c 
c 



do 1000 k2=l,3 
do 1000 k3=l, 3 
do 1000 k4-l,3 
do 1000 k5=l,3 
do 1000 k6=l,3 
do 1000 k7=l,3 
do 1000 k8=l, 3 
do 1000 k9=l,3 
do 1000 kl0=l,3 

do 1000 kll=l,3 
do 1000 kl2=l, 3 
do 1000 kl3=l,3 
do 1000 kl4=l, 3 
do 1000 kl5=l,3 
do 1000 kl6=l, 3 
' do 1000 kl7=l, 3 
do 1000 kl8=l,3 
do 1000 kl9=l,3 





do 


nbase ( 1 ) = 


^kl 


nbase (2) = 


k2 


nbase ( 3 ) = 


^k3 


nbase ( 4 ) = 


^k4 


nbase ( 5 ) = 


k5 


nbase ( 6) = 


k6 


nbase ( 7 ) = 


k7 


nbase ( 8 ) = 


k8 


nbase { 9) = 


k9 


nbase (10) 


= kl0 


nbase ( 11 ) 


= kll 


nbase (12) 


= kl2 


nbase (13) 


= kl3 


nbase (14 ) 


= kl4 


nbase (15) 


=kl5 


nbase (16) 


= kl6 


nbase ( 17 ) 


= kl7 


nbase (18) 


=kl8 


nbase (19) 


= kl9 


nbase (20) 


=k20 



do 1250 nn=l, j j 



1200 
c 



n-0 

do 1200 j=l,nsub 
if (mset (nn, j ) . 
mset (nn, j ) 
mset (nn, j ) , 
mset (nn, j ) , 
n=n+l 
endif 
continue 



eq.l .and. nbase ( j ). ne . 1 .or. 

eq.2 .and. nbase ( j ) . ne . 2 .or. 

eq.3 .and. nbase ( j ) . ne . 3 .or. 

eq.4 .and. nbase ( j ) . ne . 4 ) then 



1250 
c 



if (n. it .ndif f ) then 

goto 1000 

endif 
continue 



write (*, 130) (nbase (i) , i=l,nsub) , j j 
do 1100 i=l,nsub 
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mset ( j j , i ) =nbase ( i ) 
1100 continue 

c 

1000 continue 

c 

c 

write(*, *) 
130 format ( lOx, 20 ( Ix, il ) , 5x, i5) 

write (*, *) 

write (*, 120) j j 
120 format (Ix, 'Number of words=',iS) 

c 



end 
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APPENDIX Ic 

Exemplary computer program for generatin g 
minimally cross hybridizing sets 
(double stranded tag/single stranded tag complement) 



Program 3tagN 

c 

c 



c Program 3tagN generates minimally cross-hybridizing 

c sets of duplex subunits given i) N — subunit length, 

c and ii) an initial homopurine sequence. 



c 

c 



character*! subl (20) 

integer*2 mset ( 10000, 20 ) , nbase[20} 



write(*, *) 'ENTER SUBUNIT LENGTH* 

read{*, lOOnsub 
100 format (12) 

c 
c 

write(*, *) 'ENTER SUBUNIT SEQUENCE a & g only' 

read(MlO) ( subl ( k) , k=l , nsub) 
110 format (20al) 

c 

ndiff=10 

c 

c Let a=l and g=2 

do 800 kk=l,nsub 

if (subl (kk) .eq.''a' ) then 

mset(l,kk)=l 

endif 

if (subl.(kk) .eq. 'g* ) then 
mset(l,kk)-2 
endif 

800 continue 
c 

jj = l 

c 

do 1000 kl=l,3 
do 1000 k2=l, 3 
do 1000 k3=l, 3 
do 1000 k4=l,3 
do 1000 k5=l,3 
do 1000 k6=l,3 
do 1000 k7=l,3 
do 1000 k8=l, 3 . 
do 1000 k9=l,3 
do 1000 kl0=l,3 

do 1000 kll=l,3 
do 1000 kl2=l, 3 
do 1000 kl3=l, 3 
do 1000 kl4-l,3 
do 1000 kl5=l,3 
do 1000 kl6=l,3 
do 1000 kl7=l,3 
do 1000 kl8=l,3 
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do 1000 kl9=l,3 
do 1000 lc20=l,3 



hbase ( 1 ) = 
nbase (2) = 
nbase (3) = 
nbase (4 ) = 
nbase ( 5) = 
nbase ( 6) = 
nbase (7) 
nbase { 8 ) = 
nbase ( 9) = 
nbase (10) 
nbase (11) 
nbase ( 12 ) 
nbase (13) 
nbase ( 14 ) 
nbase (15 ) 
nbase (16) 
nbase ( 17 ) 
nbase (18) 
nbase (19) 
nbase (20) 



kl 
}c2 
k3 
k4 
k5 
k6 
k7 
k8 
k9 

= klO 
= kll 
=kl2 
= kl3 
=kl4 
= kl5 
=kl6 
=kl7 
=kl8 
= kl9 
= k20 



c 
c 



1200 
c 



1250 
c 



1100 
c 

1000 
c 

130 



120 

c 

c 



do 1250 nn=l , j j 
n=0 

do 1200 j=l,nsub 

if (mset (nn, j ) . eq . 1 .and. nbase ( j ). ne . 1 
mset (nn, j ) . eq . 2 .and. nbase ( j ) . ne . 2 
mset {nn, j } . eq . 3 .and. nbase ( j ) . ne . 3 
mset (nn, j ) . eq . 4 .and. nbase ( j ) . ne . 4 } 
n=n+l 
endi f 
cont inue 

if (n.lt.ndiff ) then 

goto 1000 

endif 
cont inue 

j j=j j+1 

writeC, 130) (nbase (i ) , i=l, nsub) , j j 
do 1100 i=l,nsub 

mset ( j j , i ) =nbase ( i ) 
cont inue 

continue 
write (*, *) 

format (lOx, 20 (Ix, il) , 5x, i5) 

write (*, *) 

write(*,120) jj . 

f ormat ( Ix, * Number of words=',i5) 



end 



or . 
or . 
or . 
then 
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SEQUENCE LISTING 



,(1) GENERAL INFORMATION: 



(i) APPLICANT: David W. Martin, Jr. 



(ii) TITLE OF INVENTION: Measurement of Gene Expression profiles in 
Toxicity Determination 



(iii) NUMBER OF SEQUENCES: 7 



(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Stephen C. Macevicz, Lynx Therapeutics, Inc. 

(B) STREET: 3832 Bay Center Place 

(C) CITY: Hayward 

(D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 94545 



(vj COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: 3.5 inch diskette 

(B) COMPUTER: IBM compatible 

(C) OPERATING SYSTEM: Windows 3.1 
(P) SOFTWARE : Microsoft Word 5.1 



(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US96 /095 1 3 

(B) FILING DATE: 06-JUN-96 



(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US95/127 91 
CB) FILING DATE: 12-OCT-95 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Stephen C. Macevicz 

(B) REGISTRATION NUMBER: 30,285 

(C) REFERENCE/DOCKET NUMBER: 813wo 



(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (510) 670-9365 

(B) TELEFAX: (510) 670-9302 



(2) INFORMATION FOR SEQ ID NO: 1: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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{xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 



CTAGTCGACC A 



(2) INF0R^4ATI0N FOR SEQ ID NO: 2: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



NRRGATCYNN N 



(2} INFORMATION FOR SEQ ID NO: 3: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 38 nucleotides 
(3) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



GAGGATGCCT TTATGGATCC ACTCGAGATC CCAATCCA 



(2) INFORMATION FOR SEQ ID NO: 4: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 nucleotides 
(3) TYPE: nucleic acid 
{C} STRANDEDNESS: double 
(D) TOPOLOGY: linear 



{xi} SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



AGTGGCTGGG CATCGGACCG 



(2) INFORMATION FOR SEQ ID NO: 5: 



(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 20 nucleotides 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

ixi} SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



GGGGCCCAGT CAGCGTCGAT 



20 



(2) INFORMATION FOR SEQ ID NO: 6: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 nucleotides 

(B) TYPE: nucleic acid 
iC} STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



ATCGACGCTG ACTGGGCCCC 



16 



(2) INFORMATION FOR SEQ ID NO: 7: 



:i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 62 nucleotides 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: ■ linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



AAAAGGAGGA GGCCTTGATA GAGAGGACCT GTTTAAACGG ATCCTCTTCC 
TCTTCCTCTT CC 
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I claim: 

1 . A method of determining the toxicity of a compound, the method comprising 
the steps of: 

5 administering the compound to a test organism; 

extracting a population of mRNA molecules from each of one or more tissues 
of the test organism; 

forming a separate population of cDNA molecules from each population of 
mRNA molecules from the one or more tissues such that each cDNA molecule of a 
1 0 separate population has an oligonucleotide tag attached, the oligonucleotide tags 
being selected from the same minimally cross-hybridizing set; 

separately sampling each population of cDNA molecules such that 
substantially all different cDNA molecules within a separate population have different 
oligonucleotide tags attached; 
1 5 sorting the cDNA molecules of each separate population by specifically 

hybridizing the oligonucleotide tags with their respective complements, the respective 
complements being attached as uniform populations of substantially identical 
complements in spatially discrete regions on one or more solid phase supports; 

determining the nucleotide sequence of a portion of each of the sorted cDNA 
20 molecules of each separate population to form a frequency distribution of expressed 
genes for each of the one or more tissues; and 

correlating the frequency distribution of expressed genes in each of the one or 
more tissues with the toxicity of the compound. 

25 2. The method of claim 1 wherein said oligonucleotide tag and said complement 
of said oligonucleotide tag are single stranded. 

3. The method of claim 2 wherein said oligonucleotide tag consists of a plurality 
of subunits, each subunit consisting of an oligonucleotide of 3 to 9 nucleotides in 

30 length and each subunit being selected from the same minimally cross-hybridizing set. 

4. The method of claim 3 wherein said one or more solid phase supports are 
microparticles and wherein said step of sorting said cDNA molecules onto the 
microparticles produces a subpopulation of loaded microparticles and a subpopulation 

35 of unloaded microparticles. 

5. The method of claim 4 further including a step of separating said loaded 
microparticles from said unloaded microparticles. 
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6. The method of claim 5 further including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is at least 10,000. 

5 

7. The method of claim 6 wherein said number of loaded microparticles is at 
least 100,000. 

8. The method of claim 7 wherein said number of loaded microparticles is at 
10 least 500,000. 

9. The method of claim 5 further including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is sufficient to estimate the relative abundance of a cDNA molecule 

1 5 present in said population at a frequency within the range of from 0.1% to 5% with a 
95% confidence limit no larger than 0.1% of said population. 

10. The method of claim 4 wherein said test organism is a mammalian tissue 
* culture. 

20 

1 1 . The method of claim 10 wherein said mammalian tissue culture comprises 
hepatocytes. 

12. The method of claim 4 wherein said test organism is an animal selected from 
25 the group consisting of rats, mice, hamsters, guinea pigs, rabbits, cats, dogs, pigs, and 

monkeys. 

13. The method of claim 12 wherein said one or more tissues are selected from the 
group consisting of liver, kidney, brain, cardiovascular, thyroid, spleen, adrenal, large 

30 intestine, small intestine, pancrease urinary bladder, stomach, ovary, testes, and 
mesenteric lymph nodes. 

14. A method of identifying genes which are differentially expressed in a selected 
35 tissue of a test animal after treatment with a compound, the method comprising the 

steps of: 

administering the compound to a test animal; 
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extracting a population of mRNA molecules from the selected tissue of the 
lest animal; 

forming a population of cDNA molecules from the population of mRNA 
molecules such that each cDNA molecule has an oligonucleotide tag attached, the 
5 oligonucleotide tags being selected from the same minimally cross-hybridizing set; 

sampling the population of cDNA molecules such that substantially all 
different cDNA molecules have different oligonucleotide tags attached; 

sorting the cDNA molecules by specifically hybridizing the oligonucleotide 
tags with their respective complements, the respective complements being attached as 
1 0 uniform populations of substantially identical complements in spatially discrete 
regions on one or more solid phase supports; 

determining the nucleotide sequence of a portion of each of the sorted cDNA 
molecules to form a frequency distribution of expressed genes; and 

identifying genes expressed in response to administering the compound by 
1 5 comparing the frequencing distribution of expressed genes of the selected tissue of the 
test animal with a frequency distribution of expressed genes of the selected tissue of a 
control animal. 

- 15, The method of claim 1 4 wherein said oligonucleotide tag and said 
20 complement of said oligonucleotide tag are single stranded. 

1 6. The method of claim 1 5 wherein said oligonucleotide tag consists of a 
plurality of subunits, each subunit consisting of an oHgonucleotide of 3 to 9 
nucleotides in length and each subunit being selected from the same minimally cross- 

25 hybridizing set. 

1 7. The method of claim 16 wherein said one or more solid phase supports are 
microparticles and wherein said step of sorting said cDNA molecules onto the 
microparticles produces a subpopulation of loaded microparticles and a subpopulation 

30 of unloaded microparticles. 

1 8. The method of claim 1 7 further including a step of separating said loaded 
microparticles from said unloaded microparticles. 

35 19. The method of claim 1 8 further including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is at least 1 0,000. 
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20. The method of claim 19 wherein said number of loaded microparticles is at 
least 100.000. 

2 1 . The method of claim 20 wherein said number of loaded microparticles is at 
5 least 500,000. 

22. The method of claim 18 further including a step of repeating said steps of 
sampling, sorting, and separating until a number of said loaded microparticles is 
accumulated is sufficient to estimate the relative abundance of a cDNA molecule 

1 0 present in said population at a frequency within the range of from 0. 1 % to' 5% with a 
95% confidence limit no larger than 0. 1 % of said population. 

23. The method of claim 1 7 wherein said test animal is selected from the group 
consisting of rats, mice, banisters, guinea pigs, rabbits, cats, dogs, pigs, and monkeys. 

15 

24. The method of claim 23 wherein said selected tissue is selected from the 
group consisting of liver, kidney, brain, cardiovascular, thyroid, spleen, adrenal, large 
intestine, small intestine, pancrease urinary bladder, stomach, ovary, testes, and 

- mesenteric lymph nodes. 

20 . 

25. A use of the technique of massively parallel signature sequencing to determine 
the toxicity of a compound in a test organism, the use comprising the steps of: 

administering the compound to a test organism; 

extracting a population of mRNA molecules from each of one or more tissues 
25 of the test organism and forming a population of cDNA molecules for each of the one 
or more tissues; 

determining the nucleotide sequence of a portion of each of the cDNA 
molecules of each separate population using massively parallel signature sequencing 
to form a frequency distribution of expressed genes for each of the one or more 
30 tissues; and 

correlating the frequency distribution of expressed genes in each of the one or 
more tissues with the toxicity of the compound. 

26. The use of claim 25 wherein said test organism is a mammalian tissue culture. 

35 

27. The use of claim 26 wherein said mammalian tissue culture comprises 
hepatocytes. 
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28. The use of claim 25 wherein said test organism is an animal selected from the 
group consisting of rats, mice, hamsters, guinea pigs, rabbits, cats, dogs, pigs, and 
monkeys. 

5 29, The use of claim 28 wherein said one or more tissues are selected from the 
group consisting of liver, kidney, brain, cardiovascular, thyroid, spleen, adrenal, large 
intestine, small intestine, pancrease urinary bladder, stomach, ovary, testes, and 
mesenteric lymph nodes. 

10 30. A use of the technique of massively parallel signature sequencing to identify 
genes which are differentially expressed in a test organism after treatment with a 
compound and which are correlated with toxicity of the compound, the use 
comprising the steps of: 

administering the compound to the test organism; 
1 5 extracting a population of mRNA molecules from a selected tissue of the test 

organism and forming a population of cDNA molecules; 

determining the nucleotide sequence of a portion of each of the cDNA 
molecules using massively parallel signature sequencing to form a frequency 
- distribution of expressed genes; 
20 identifying genes expressed in response to administering the compound by 

comparing the frequencing distribution of expressed genes of the selected tissue of the 
test organism with a frequency distribution of expressed genes of the selected tissue 
of a control organism; and 

determining whether the genes expressed in response to administering the 
25 compound are correlated with toxicity, of the compound in the test organism. 
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Copyright 1997 PR Newswire Association, Inc. 
PR Newswire 

August 11, 1997. Monday 

SECTION: Financial News 

DISTRIBUTION: TO BUSINESS AND MEDICAL EDITORS 
LENGTH: 478 words 

HEADLINE: Eli Lilly & Co. and Acacia Biosciences Enter Into Research Collaboration; 
First Corporate Agreement for Acacia's Genome Reporter Matrix(TM) 

DATELINE: RICHMOND, Calif., Aug. 11 

BOI3Y: 

Acada Biosciences and Eli Lilly and Conqjany (Lilly) announced today the signing of a joint research collaboration 
to utilize Acacia's Genome Reporter Matrix(TM) (GRM) to aid in the selection and optimization of lead con^unds. 
Under the collaboration, Acacia will provide chemical and biological profiles on a class of LOly 's compounds for an 
imdisclosed fee. 

Acacia's GRM is an assay-based computer modeling system that uses yeast as a miniature ecosystem. The GRM 
can profile the extent, nature and quantity of any changes in gene expression. Because of the similarities between 
the yeast and human genome, the system serves as an excellent surrogate for the human body, mimicking the effects 
induced by a biologically active molecule. 

"Using yeast as a model organism for lead optimization makes a lot of sense given the high degree of homology with 
human metabolic pathways/ said William Current of Lilly Research Laboratories. "Acacia's innovative GRM has 
the potential to provide enormous insight into the ther^eutic in^jact of our coirq)oimds and make the drug discovery 
process more rational. It should substantially accelerate the development process. " 

"This first agreement with a major pharmaceutical company is an important milestone in the development of 
Acacia," said Bruce Cohen, President and CEO of Acacia. "The deal is in line with our strategy of establishing 
alliances that will allow our collaborators to use genomic profiles to identify and optimize conqwunds within 
their existing portfolios. In the long run. this technology can be used to characterize large scale combinatorial 
libraries, predict side effects prior to clinical trials and resurrect drugs that have failed during clinical trials." 

The GRM incorporates two critical elements: chemical response profiles and genetic response profiles. The 
chemical response profiles measure the change in gene expression caused by potential therapeutics and then rank genes 
with altered expressions by degree of response. The genetic response profiles measure changes in gene explosion 
caused by mutations in the genes encoding potential targets of pharmaceuticals; these genetic response profiles represent 
gold standards in drug discovery by defining the response profile expected for drugs with perfect selectivity and 
specificity. By comparing the two profiles, one can anal)ae a potential drug candidate's ability to mimic the action of 
a 'perfect' dmg. 

Acacia Biosciences is a functional genomics company developing proprietary technologies to enhance the speed 
and efficacy of drug discovery and development. Acacia's Genome Reporter Matrix capitalizes on the latest advances 
in genomics and combinatorial chemistry to generate comprehensive profiles of drug candidates' in vivo activity. 
SOURCE Acacia Biosciences 

CONTACT: Bruce Cohen. President and CEO of Acada Biosciences, 510-669-2330 ext. 103 or Media: Linda 
Seaton of Feinstein 
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Pharmagene 
Raises More 
Capital for 
Research on 
Human 
Tissues 

By Sophia Fra 

wyamiagene, the Royston, 

M. cal company specialising in 
the use of hunian biomaterials for 



£5 milHon from a group of 
investors led by 3i and Abacus 
Nccninees The ninding will enable 
the company to expand both its 
hurtan trionutenals ccdiection and 
its caiabilitics across a range of pro- 
prietary platform technologies. 

Gordon Baxter, Ph.p., 
Pharmagene^ cofounder and chief 
opeiating officer, claimed "Ijy the 
end of mis year I^tannagcne will 
have access to the largest collection 
of human RNAa and proteins aiiy- 
where in the world, and a range of 
innovative, yet robust technologies 
SEE PHARMAOaite. P. 0 



Perkin-Elmer Acquires PerSeptive to Expand 
Its Capabilities in Uene-BaseoDrug Discovery 



By John Sterling 

P^D^mer^ (PE: Norwalk, 
CD decision last month to 
acquire PerScprive Bto- 
systrms (Framtngham, MA) via a 
S360 million stock swap was 
designed to strengthen PE in terms 
of broad capabilities in gene-based 
drug discovery. The company^ 
main goal is todevriop new pnxl- 
ucts to improve the integration of 
genetic and protein research. 

*^is i ita g pr will enhance our 
position as an cfTective provider of 
innovative, integrated platforms 
enabling our customers to be mare 
efficteni and cost-effective in bring- 
ing new frfiarmaoeuticals to mar* 
kct," says Tory t. White, PE^ 
chairman, president and CBO. The 
combination of our two conq^ies 
should bolster our picscnce in the 
Hfe sciences, [mA it is our] belief 
that we must take bold action now 
to lead the emerging era of molecu- 
lar medicine with leading positions 
in both genetic and protein analy- 
sis." 

A driving force behind the 
in ei gci is the vast amount of genet- 



FDA OKs Genzyme's Carticel 
Product for Damage to Knees 



— Defect 



I — Periostea! flap 




Cfinzyme Thsiic Rcpa 



Cell Procps^ing 



CarticeL wfikh uos appmvedjbr the repair of dinicaHy significant, symp- 
tomatic cartilagfnous defects cf the Jentonl condyle (medial, lateral or 
trochlear) caused by acute or rmetitne trauma, employs a proprietary 
process to grmv au^ogpus cartilage celltjhr implantation. 



By Naomt PfeifTcr 

The FDA has approved a knee- 
cartilage icptaocmeiTt product 
made by Geniyme Tissue 
Repafa* (Cambridge. MA), a tiock- 
ing-stock division of Gcnzyme 
CotPm for people with trauma- 
darr^ged knees. 

Caiticel" (autologous cultured 
chondrocytes) is the fiRtpnxhjct to 
be licensed under the FDA^ pro- 
SEEQENZYME^P.e 
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Sticky Ends 



Avlgen received two 
grant B £rom the NIH & 
University of Cali- 
fornia for research 
on gene therapy for 
treatment of cancer & 
HIV Infections . . .KRI. 
Pharsoaeautic&l Servi- 
ces, of Reston, VA, 
launched the TSK Bug 
Finder, which Is able 
to locate & retrieve 
client -specif led mi- 
croorganisms in real- 
time. . .Oenaia Sioor, 
Xttc. will move its 
corporate staff from 
San Diego to Irvine r 
CA, by end of year. . . 




ic mfbimation about human dis- 
ea.% that is being accumulated by 
researchers and biotcch com|»nics 
working in the area of genomics. It 
is becoming increasingly obvious 
that these data need to be comple- 
mented with technologies for 



studying proteins and protein net- 
works — a field known asl^pro- 
tcomics {siv GEN, September /. 
1997. p J). 

PE officials, who claini that 
MALDl-TOF (Matrix Assisted 
SEE ACOUtSmON. P. 10 



Strategies for Target Validation 
Streamnne Evaluation of Leads 



ByVkldGlaser 

cacla BkMcleneet (Rich- 
mond, CA) last month 

xA its first agiiee- 

ment with a major pharmaceutical 
company signing a deal with £0 
Ully (Indianapolis. IN) to use 
Acacia^ Genome Reporter Matrix 
(GRM) to select and optimize some 
of Ulfyli lead oxnpcjunds. Acacia^ 
yeasl-based system for profiling 
drug activity is useful for cvahiating 
the therapeutic potential of lead 
comp ounds, and it also has a role in 
the idbitification and validation of 
new drug targets. 

••We're u.«iing the ecosystem of a 
cell to allow us to deduce the mech- 
anism of action and target for any 
chemical.*' explains Bnice Cohen, 
president and CEO. "We screen for 
every target in a cell simultanoom- 
ly...ustng transoiptton as a readout 



for how a cell Ls adapting to any 
perturbation," he says. 

The GRM technology consists of 
two main databases: one is the 
genetic response profile, showing 
the effects of mutations in each 
individua] yeast gene and compen- 
satory gene regulatory mecha- 
nisms; the other is the chemical 
response profile, which documents 
changes in gene expression in 
response to chemical compounds. 
Computational analyse and pattern 
maichine between the gerietic and 
chemical profiles yidd^ informa- 
tion on the specificity, potency and 
side-effects risk of a drug lead 

Targettng Targets 

No longer is mapping and 
sequencing a gene — or the human 
genome — an end unto itself, but 
SEETARQET.P.IS 



FDA accepted MDA from 
Sepracor for leva Ibu- 
terol HCl inhalation 
solution. . .An $11. 7M 
mezzanine financing 
ha s been cl osed by 
Activated Cell Thera- 
py, which changed its 
name to Dendreon Cor- 
poration . . . Astra AB 
will build major re- 
search facility in 
Halt ham, MA, and is 
also relocating Astra 
Arous research facil- 
ity from Rochester to 
Boston area. . .Prollf- 
ix Ltd. team used a 
small peptide to in- 
hibit the EiF protein 
complex and Induced 



apoptosis in mammali- 
an tumor cells,.. Ver* 
t«x Phannaaeutioala, 
Inc. and Alpha Thera- 
peutic Corp. ended an 
agreement to develop 
VX-366 for treatment 
of inherited hemoglo- 
bin disorders. . .Havi- 
Cyte received Phase I 
SBIR grant for up to 
$100,000 from NIH for 
developmenc of proto- 
type of ice MaviFlow 
technology for high- 
throughput screening 
. . .Covan.oe Ino> will 
invest $21 million in 
expansion and renova- 
tion of its facility 
in Indianapolis, IN. 
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Target 



merely a mcaas lo an end The criti- 
cal next step is to v'atidatc the gene 
imd its prDtcinpioduct as a potential 
drug tai^et. The Human Genome 
(Vojcct continues to produce a tica- 
suie chest of cxprcs.*;cd sequence 
lags (ESTs) and a tantalizing array of 
complete gene sequences. 

Companies are applying a variety 
of functional genomic strategies to 
link genes to specific diseases and to 
multigcnic phenotypcs. Yet the ulti- 
mate challenge for phannaceuttcal 
companies U to sifl through all the 
sequence and difTerential gene 
expression data to identify the best 
tai^ for drug diseov'cry. 

Spinning off technology devel- 
oped at the University of North 
Carolina <Chapel Hill), Cytogen 
Corp. (Princeton. NJ) formed its 
wholly owned subsidiary AxCell 
Biosciences earlier this year. The 
young company is building a protein 
interaction database, cataloging all 
the interactions the modular domains 
of proteins can engage in with a 



range of ligands, in order to gain 
Insight into protein function and to 
select the most critical interaction to 
tai^ for drug devclopnricnt. 

AxCcll^i cloning-oMtgond-taigets 
(COLT) technology employs **rccog- 
nition units" from the company^ 
genetic diversity library (CDL) to 
map functional protein interactions 
and quantiiate their afTinity. The 
company)! intcr-functional protcom- 
tc datinaac (IFP-dbasc) elucidates 
protein interaction networks and 
structure-activity relationships based 
on limnd affinity with protein mod- 
ular dmnains. 

Penning Disease Pathways 

Signal Pharmaceoticab, lnc,H 
(San Diego, CA) integrated drtig tar- 
get and discovery effort ts based on 
nrappmg gene-regulating pathways in 
cells and identifying small molecules 
that regubte the activation of those 
acnes. In collaboration with academ- 
ic researchers, the company has iden- 
tified a laiBC number of regubtory 
proteins in several mitogen-acthrated 
protein (MAP) kinase pathways 
(including the INK, FRK and p38 
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stgnatirig pathways^ which Signal is 
evaluating for the treatment of 
autoimmune, inflammatory, cardio- 
vascular and neurologic diseases, and 
cancer. Other target tdentification 



programs focus on the NF-kB path- 
way, estrogen-fclated genes and ccr>- 
tral/Jperipheral nervous system genes. 

Regulating cytokine production in 
immune and infWunatoty disorders. 
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and ntodilying bone metabolism to 
treat ostccfxvosis are the focus of 
Signal^ collaboration with Tanabe 
Setyaku (Oiaka. Japan). Signal has 
partnered with Organon/Ak/o 
Nohcl {Netherlands) to identify 
estrogen-responsive genes as targets 
for tieating ncurod^enerative and 
psychiatric diseases, athcnisck;ro«is 
and ischcmi:i. and wiih Rnchr 
Bioscience (I^alo Alto. CA) to devel- 
op human pcriphcml nerve cell lines 
for the discovery of treatments for 
pain and incontinence. 

Exelixb* (S. San Francisco. CA) 
strategy for target scfcction is lo 
(teffne disease pathways and identity 
regulatory molecules that activate or 
inhibit .those biochemical/genetic 
pathways. Based on the finding that 
these pathways are conserved across 
species, the company us .studying the 
model genetic systems of Drosophila 
and Caenorhahditis elegans. Using 
its PathFindcr technology, Exelixis 
systematically introduces mutations 
imo the genomes of these nrwdel 
organisms, looking for mutations 
that enhance or suppress the target 
disease-related gene. These novel 
genes then become the basis of drug 
screening ass^ 

Cados Pharmaceutical Corii. 
(Tairytown, NY) is identifying sur- 
rogate ligands to newly discovered 
orphan G-protein coupled trans- 
membrane receptors of unknown 
function to determine the suitability 
of the rccepton as drug taigets. 
Inserting the novel receptor in a 
yoasx .system yields a ligand that 
activates the receptor. Access to a 
surrogate ligand allows the company 
to .screen for receptor antagonists in 
the yeast system, 

"The anta^nist plus the surro- 
gate ligand gives you two probes — 
an on probe and an off probe — 
which allows you to k»k at func- 
tion" explains David Webb, Ph.D., 
vp of research and chief scientific 
ofHccr. A surrogate ligand also pro- 
vides information on whtdi G-pro- 
tein interacts with the orphan recep- 
tor and its associated signaling path- 
ways, further clarifying the role of 
the receptor as a potential drug tar- 
get. Cadus* collaboration with 
SmlthKIlne (Philadelphia) capital- 
izes on Cadus* ability to determirte 
orphan receptor function, applying 
the technok^ to SmithKline^t pro- 
prietary, newly discovered G-pro- 
tein receptors. 

Cadus* recombinant yeast system 
can also be used to s c re e n cell and , 
tissue extracts for naniial ligands. 
and the uimpnny Ls accelerating its 
internal drug-discovery efforts in the 
areas of cancer, rnflammation and 
allergy. A recent equity investment in 
Axiom Btotechnologies (San Diego, 
CA) gave Cadas a license to AxiomVi 
high-throughput pharmacologic 
screening system for lead optimiza- 
tion and discovery. 

As its name implies. 
gene/Nelworlu (Alameda, CA) 
focuses on identifying gene nclwroks 
that contribute to multigcnic pheno- 
typcs and complex disease process- 
es. The integration of mouse and 
hunwn genetic smdies forms the 
basis of the technology Tlie Genome 
Tagged Mice database in devctop- 
mcni will serve as a library of natur- 
al mouse genetic and phenotypk; 
variation. Disease-related genes 
identified in mice are then evaluated 
in human family- and population- 
based snidics to confirm their clini- 
cal rclc\^incc and linkages lo patho- 
physiologic tmits. 

Blocking Gene Expression 

InaciK-aiing a gene known to be 
cxpres.scd in ;is.sociation with a par- 
ticular disease is one approach to 
idcntif>'tng appropriate therapeutic 
targets. The target \'atidolion and dis- 
co\xTy program at RIbozymc 
Pharmaceuticals, Inc. (Boulder. 
C( )) applies the company's ribo/yme 
tcchm»l(igy to itchicw selective inhi- 
bitiini of gene expression in cell cul- 
ture and In animals. 

CorrclaiiiMi of the gene cxprvs- 
sion inlithlti<vn with phcmMypc c:in 
SEE TARGET. P. 38 
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suggest the relative importance of 
the gene in disease pathology. The 
company^ nuclease-resistant 
ribozyines form the basis of a col- 
laboration with Scbering AG 
(Germany) for drug target validation 
and the developmem of ribozyme- 
based tfacntpetttic agents^ and with 
Chiron Corp. (EmCTyville, CA) for 
target validation. 

With sevoal antisense compounds 
now progressing through clinical tri- 
als, the concqn of using oligonu- 
cleotides to inhibit gene activity is 
not new. But rather than focusing on 
therapeutics development, Seqoitiu; 
Inc. (Natick. MA) is creating anti- 
sense compounds for the purpose of 
determining gene function and vali- 
dating drug targets. Clients typically 
provide the one-year-old compariy 
with the sequence (or EST) of a 
potential gene target and. in retunu 
Sequitur custom designs a scries of 
three to six antisense compounds that 
yield a thrce-to-ten-fold inhibition of 
the target gene in cell cuhure. The 
company also provides oligofectins, 
a series of cationic lipids, to deliver 
the oligonucleotides to a variety of 
cultured cells. 

"Differential expression ir\foima- 
tion is just for correlation, it doesn*t 
tell function or confirm what would 
be a good target;* says Tod Woolf, 
PhJ)., director of technology devel- 
opment at Sequitur, Whereas, anti- 
sense compounds will inhibit a tar- 
get Sequitur offers both phospho- 
rothioaie DMA antisense com- 
pounds, and its proprietary Next 
Generation chimeric oligonu- 
cleotides, which have a higher 
hytmdization afTmity. greater speci- 
ficity and reduced toxicity, according 
to the company 

Mining Pathogen Genomes' 

Companies such as Human 
Genome Sciences (HGS; Rockvillc, 
MD), locyte (Palo Alto. CA\ 




AxCell Biosciences scientists say their technology enables the nqnd and 
simple Junctional identifuxition of the two essential mol^^ 
<tf protein interaction nenworifcr; specific recognition units that bind distinct 
modular protein domains are identified and isoltited using a combination 
stnctund/fimctional approach that uses both peptide f^tase display Genetic 
Diversity Libraries (GDI) and bioinfonnatics, and chnir^ of ligand 
Targets (COLT) technology utilizes recognition units as Junctional probes to 
isolate families of tnteractor proteins. 



MiUcnoium Pharmacentkab Inc. 
(Cambridge, MA) and Genome 
Tberapcatks (Waltham. MA) are 
relying on high-speed DNA sequenc- 
ing, positional cloning and other 
strategies to identify specific micro- 
bial getwrnic sites that would be 
good targets for infectious disease 



HGS recently completed sequenc- 
ing of the bacterial pathogen 
Streptococcus pneumoniae, which is 
the focus of an agreement with 
HofTmann-La Roche (Basel, 
Switzerland). Roche will use the 
sequence d^ to develop new anti- 
infectives against 5. pneumoniae. 
HGS and Roche have exparkded their 
collaboration to include a nonexclu- 
sive license to access sequence infor- 
mation for the intestinal bacterium 
Enterococcus faecalis. 

IruMc Pharmaceuticals has com- 
pleted one-fold coverage of the 
Candida albicans genome, identify- 



ing 60% of the genes of this fimgal 
p^hogen. This genonfK will become 
part of the company^ PathoSeq 
microbial database. Incyte recently 
introduced the ZooSeq ammal gerse 
sequence and expression database. 
The database will provide genomic 
information across various species 
comntonl>- used in preclinical drug 
testing, which may help to better 
define potential dni^ targets. 

Millcrmium Pharmaceuticals cort- 
tinues to report success in identifying 
novel drug targets, having recently 
discovered a novel chemoldne called 
neurotactin and a new class of MAD- 
retated proteins that inhibit trans- 
forming growth fector beta (TGF-B) 
signaling. The company also 
received US. patent coverage for the 
tub genes, believed to play a role in 
(teity, and for the gene that encodes 
the protein nrKlastatin, which appears 
to suppress metastasis in malignant 
melanoma. ^* 



HIGH SPECIFIC ACTIVITY 
MICROBIAL ALKALINE 
PHOSPHATASE 
from Biocatalysts 

Biocatalysts Limited, the British speciality enzyme 
company, has developed a completely new type of 
alkaline phosphatase with many advantages over the 
types most commonly used. 
It is of microbial origin with a high specific activity 
(unSke that from E coli) and with hi^er temperatiire and 
storage stability compared to that from calf intestine. 
This is the first of several new generation diagnostic 
enzymes t>eing developed by Biocatalysts Limited with 
greatly improved stability. 

• Non-anlmal source, no risk of BSE or aBtmal 
virus contamination 

• Higher temperature sUblltty than calf iotesUoe 

• Much hlgliersiieclficacthrlty than from E. coll 

• Very high storage stability even In the absence 
of glycerol 

for futth& details on alkalhw phosphatase and out other 
diagttostic entymes contact us direct at the address below or 
within North Amenca contact ottr US Disttibutor Kattron-Pettibone 
'phorw: 630 350 U 16 or lax 630-350-1606 

Btocatalyttt Umtted 

Traforeit btdtistrtal Eftata Pwtyprldd Walts UK CF17 SUO 
Tel: 4-44 (0)1443 84S712 Fas 444 (0)1443 B41214 
aHBaU-KaDy®Bloestal|itsxQM. 
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Pangea 



Smith, new a con^xiter prDgranv 
mer, is an expert in systems integra- 
tion. Intemet technologies and the 
nrpliration of industrial engineering 
principles to the drug Sscovery 
process. Before co-founding Pangea, 
he was the manager of software 
development at Axxaney^ Briefcase, 
a legal research software company. 

By being '*in the trenches" with 
customers and collaborators, 
Bellenson and Smith sensed the 
frustration of pharmaceudcal 
researchers whose incompatible 
tools have impeded their progress. 
According to Bellenson, "Most of 
them ate geared toward analyzir^ 
one molecule at a time. It^tikeem|>- 
tying the ocean with an eye drop- 
per—an incompatible eye dropper at 
that. A' phanmaceutical company 
may have 30 different drug discov- 
ery teams with various approacties. 
The problem is to manage the 
process of experimenting with a lot 
of different approaches, to automate 
wiiile maintaining flexibility." 

Gene World 2.1 enables "integra- 
tion of the entire target discovery and 
validation process^* Bellenson says. 
The commercial software package 
coordinates die entire process , of 
sequence-data analysis mid cari be 
integrated widi other programs and 
databases, according to Smith, who 
adds that it handles thousands of 
sequence results, organizes and auto- 
mates annotation and seamlessly 
interacts wtdi growir)g genome data- 
bases. Simple forms aruJ menus 
enable users to him raw sequence 
data into crucial knowledge for drug 
discovery by applyir^g algorithms to 
sequences, creating custom analysis 
strategics and ; producing useful 
reports, without the need for writing 
computer code. Gene>WBrld 2. 1 runs 
on a variety of platforms and operat- 
ing systems. 

Pairing industrial relational data- 
base-managcnwnt systems with a 
web-browser interface, . Pangea Is 
Operating System of Drug 
Discovery"" is an open-computing 
framework diat allows client/server 
and Java-enabled web-based tech- 
nologies to collect, organize and ana- 
lyze drug dtscoycry information for 
pharmaceutical companies to simpli- 
fy and accelerate drug discovery. The 
technology unites automated 
genomics database analysis for drug 
taiget site selection, chemical infor- 
mation database analysis and large- 
scale combinatorial chemistry pro- 
ject management and high-through- 
put screening project management 
for drug lead efficacy arudysis. 
Pangea offtciab maintain that these 
integrated elements provide a unified 
environment for chemists, biologists 
and odiers involved in the drug dis- 
covery process to woric together with 
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commercial and public domain 
software. 

Pangea^ Operating System' of 
I>ug Discovery can accommodate 
Syfa^, Oracle or Informix relation- 
al database-management systems 
and any version of UNIX. It absorbs 
new data formats, databases, algo- 
rithms and analysis paradigms into 
the automated workflow without 
software modifications. Netscape 
Navigator*" provides a friendly user 
interface fixjm PC, Macintosh, and 
UNIX «ori(Stations. 

In die near term, Pangea plans to 
complete its btoinformatics core 
with two nftore programs. Gene 
Foundry, a sample tracking and 
workflow sequence package for 
DNA sequence and fragment infor- 
madon, will also offer interaction 
with robots, reagent tracking and 
troubleshooting. Gene Thesaurus, 
the other package is a 'Warehouse 
of bioinformatics data,** says 
Bellenson. ■ 



Europe 

from page 30 

GTAC Chairman, Professor 
Norman C. Nevin, said 1996 saw 
**fbur important developments'*: an 
increase in enquiries and submis- 
sions made to GTAC; an increase in 
die complexity of submitted proto- 
cols; a continuing shift from gene 
therapy for single-gene disorders 
toward strategies aimed al tumour 
destruction in catKcr, and a growth 
in international sponsorship of UK. 
gene therapy trials. 

Since 1993. GTAC and its prede- 
cessor, the Clothier Conrunittec, have 
apnraved 18 UX. ^ene therapy clini- 
cal trials (13 of which have been car- 
ried out), which are listed in the 
report The disease areas taiigetod fay 
these trials inchide severe combined 
immurMdeficiency (1 trial), cysdc 
fibrosis (6), metastatic melanoma {2% 
lymphoma (2\ neurobtastorm (U 
breast canc er (UHurier^ syndrome 
( H cervical cancer ( I ). gl)(»>lastoma 



with liver 
metastases, glioblastoma, malignant 
ascites due to gastrointestina] cancer 
and ovarian caiKer. 

Copies of die GTAC thrid annual 
report are available from die GTAC 
Secretariat, Wdlington House, 133- 
155 \^Moo Road, London SEI 
8UG,UK. 

Coated Lenses Prevent PCO 

Scientists in the UK. s^ it may be 
possible to prevent posterior capsule 
opacification (PCO), a common 
complication following cataract 
sur)gery, by using the impJ^nted poly- 
methylmethacrylate (PMMA) 
intraocular lens as a dmg delivery 
system. PCO occurs in 30-5(]% of 
cataract surgery padents as a result of 
stimulated cell grcMih within the 
remaining capsular tog. The condi- 
tion causes a decline in visual acuity 
and requires expensive laser treat- 
ment, tfms negating the routine use of 
cataract surgery in underdevek)ped 
countries, explains G. Duncan, al the 
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Exploring the Metabolic and Genetic Control of 
Gene Expression on a Genomic Scale 

Joseph L DeRisi, Vishwanath R. Iyer. Patrick O. Brown* 

DNA microarrays containing virtually every gene o1 Saccharomyces cerevisiae were used 
to carry out a comprehensive investigation of the temporal program of gene expression 
accompanying the metabolic shift from fermentation to respiration. The expression 
profiles observed for genes with known metabolic functions pointed to features of the 
metabolic reprogramming that occur during the diauxic shift, and the expression patterns 
of many previously uncharacterized genes provided clues to their possible functions. The 
same DNA microan-ays were also used to identify genes whose expression was affected 
by deletion of the transcriptional co-repressor TUP1 or overexpression of the transcrip- 
tional activator YAP1, These results demonstrate the feasibility and utility of this ap- 
proach to genomewide exploration of gene expression patterns. 



The complete sequences of nearly a dozen 
microbial genomes are known, and in the 
next several years we expect to know the 
complete genome sequences of several 
metazoans, including the human genome. 
Defining the role of each gene in these 
genomes will be a formidable task, and un- 
derstanding how the genome functions as a 
whole in the complex natural history of a 
living organism presents an even greater 
challenge. 

Knowing when and where a gene is 
expressed often provides a strong clue as to 
its biological role. Conversely, the pattern 
of genes expressed in a cell can provide 
detailed information about its state. Al- 
though regulation of protein abundance in 
a cell is by no means accomplished solely 
by regulation of mRNA, virtually all dif- 
ferences in cell type or state are correlated 
with changes in the mRNA levels of many 
genes. This is fortuitous because the only 
specific reagent required to measure the 
abundance of the mRNA for a specific 
gene is a cDNA sequence. DNA microar- 
rays, consisting of thousands of individual 
gene sequences printed in a high-density 
array on a glass microscope slide (], 2), 
provide a practical and economical tool 
for studying gene expression on a very 
large scale (3-6). 

Saccharomyces cerevisiae is an especially 

Department of Biochemistry, Stanford University School 
of Medicine. Howard Hughes Medical institute. Stanford. 
CA 94305-5428. USA. 

* To whom correspondence should be addressed. E-mail: 
pbrown©cmgm.stanford.edu 



favorable organism in which to conduct a 
systematic investigation of gene expression. 
The genes are easy to recognize in the ge- 
nome sequence, cis regulatory elements are 
generally compact and close to the tran- 
scription units, much is already known 
about its genetic regulatory mechanisms, 
and a powerful set of tools is available for its 
analysis. 

A recurring cycle in the natural history 
of yeast involves a shift from anaerobic 
(fermentation) to aerobic (respiration) me- 
tabolism. Inoculation of yeast into a medi- 
um rich in sugar is followed by rapid growth 
fueled by fermentation, with the production 
of ethanol. When the fermentable sugar is 
exhausted, the yeast cells turn to ethanol as 
a carbon source for aerobic growth. This 
switch from anaerobic growth to aerobic 
respiration upon depletion of glucose, re- 
fened to as the diauxic shift, is correlated 
with widespread changes in the expression 
of genes involved in fundamental cellular 
processes such as carbon metabolism, pro- 
tein synthesis, and carbohydrate storage 
(7). We used DNA microarrays to charac- 
terize the changes in gene expression that 
take place during this process for nearly the 
entire genome, and to investigate the ge- 
netic circuitry that regulates and executes 
this program. 

Yeast open reading frames (ORFs) were 
amplified by the polymerase chain reaction 
(PGR), with a commercially available set of 
primer pairs (8). DNA microarrays, con- 
taining approximately 6400 distinct DNA 
sequences, were printed onto glass slides by 



using a simple robotic printing device (9). 
Celb from an exponentially growing culture 
of yeast were inoculated into fresh medium 
and grown at 30*'C for 21 hours. After an 
initial 9 hours of growth, samples were har- 
vested at seven successive 2-hour intervals, 
and nJlNA was isolated (10). Fluorescently 
labeled cDN A was prepared by reverse tran- 
scription in the presence of Cy3(green)- 
or Cy5(red)-labeled deoxyuridine triphos- 
phate (dUTP) (11) and then hybridized to 
the microarrays (12). To maximize the re- 
liability with which changes in expression 
levels could be discerned, we labeled cDNA 
prepared from cells at each successive time 
point with Cy5, then mixed it with a Cy3- 
labeled "reference" cDNA sample prepared 
from cells harvested at the first interval 
after inoculation. In this experimental de- 
sign, the relative fluorescence intensity, 
measured for the Cy3 and Cy5 fluors at 
each array element provides a reliable mea- 
sure of the relative abundance of the corre- 
sponding mRNA in the two cell popula- 
tions (Fig. 1). Data from the series of seven 
samples (Fig. 2), consisting of more than 
43,000 expression-ratio measurements, 
were organized into a database to facilitate 
efficient exploration and analysis of the 
results. This database is publicly available 
on the Internet (13). 

During exponential growth in glucose- 
rich medium, the global pattern of gene 
expression was remarkably stable. Indeed, 
when gene expression patterns between the 
first two cell samples (harvested at a 2-hour 
interval) were compared, mRNA levels dif- 
fered by a factor of 2 or more for only 19 
genes (0.3%). and the largest of these dif- 
ferences was only 2.7-fold (14). However, as 
glucose was progressively depleted from the 
growth media during the course of the ex- 
periment, a marked change was seen in the 
global pattern of gene expression. mRNA 
levels for approximately 710 genes were 
induced by a factor of at least 2. and the 
mRNA levels for approximately 1030 genes 
declined by a factor of at least 2. Messenger 
RNA levels for 183 genes increased by a 
factor of at least 4, and mRNA levels for 
203 genes diminished by a factor of at least 
4. About half of these differentially ex- 
pressed genes have no currently recognized 
function and are not yet named. Indeed, 
more than 400 of the differentially ex- 
pressed genes have no apparent homology 
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to any gene whose function is known (J5). 
The responses of these previously unchar- 
acterized genes to the diauxic shift therefore 
provides the first small clue to their possible 
roles. 

The global view of changes in expres- 
sion of genes with known functions pro- 
vides a vivid picture of the way in which 
the cell adapts to a changing environ- 
ment. Figure 3 shows a portion of the yeast 
metabolic pathways involved in carbon 
and energy metabolism. Mapping the 
changes we observed in the mRNAs en- 
coding each enzyme onto this framework 
allowed us to infer the redirection in the 
flow of metabolites through this system. 
We observed large inductions of the genes 
coding for the enzymes aldehyde dehydro- 
genase (ALD2) and acetyl-coenzyme 
A(CoA) synthase (ACS J), which func- 
tion together to convert the products of 
alcohol dehydrogenase into acetyl-CoA, 
which in turn is used to fuel the tricarbox- 
ylic acid (TCA) cycle and the glyoxylate 
cycle. The concomitant shutdown of tran- 
scription of the genes encoding pyruvate 
decarboxylase and induction of pyruvate 
carboxylase rechannels pyruvate away 
from acetaidehyde, arid instead to oxalac- 
etate, where it can serve to supply the 
TCA cycle and gluconeogenesis. Induc- 
tion of the pivotal genes PCKl , encoding 
phosphoenolpyruvate carboxykinase, and 
FBPl, encoding fructose 1,6-biphos- 
phatase, switches the directions of two key 
irreversible steps in glycolysis, reversing 
the flow of metabolites along the revers- 
ible steps of the glycolytic pathway toward 
the essential biosynthetic precursor, glu- 
cose- 6-phosphate. Induction of the genes 
coding for the trehalose synthase and gly- 
cogen synthase complexes promotes chan- 
neling of glucose-6-phosphate into these 
carbohydrate storage pathways. 

Just as the changes in expression of 
genes encoding pivotal enzymes can pro- 
vide insight into metabolic reprogram- 
ming, the behavior of large groups of func- 
tionally related genes can provide a broad 
view of the systematic way in which the 
yeast cell adapts to a changing environ- 
ment (Fig. 4). Several classes of genes, 
such as cytochrome c-related genes and 
those involved in the TCA/glyoxylate cy- 
cle and carbohydrate storage, were coordi- 
nately induced by glucose exhaustion. In 
contrast, genes devoted to protein synthe- 
sis, including ribosomal proteins, tRNA 
synthetases, and translation, elongation, 
and initiation factors, exhibited a coordi- 
nated decrease in expression. More than 
95% of ribosomal genes showed at least 
twofold decreases in expression during the 
diauxic shift (Fig. 4) (]3). A noteworthy 
and illuminating exception was that the 



genes encoding mitochondrial ribosomal 
genes were generally induced rather than 
repressed after glucose limitation, high- 
lighting the requirement for mitchondrial 
biogenesis (13). As more is learned about 
the functions of every gene in the yeast 
genome, the ability to gain insight into a 
cell's response to a changing environment 
through its global gene expression patterns 
will become increasingly powerful. 

Several distinct temporal patterns of ex- 
pression could be recognized, and sets of 
genes could be grouped on the basis of the 
similarities in their expression patterns. TTie 
characterized members of each of these 
groups also shared important similarities in 
their fiinctior^. Moreover, in most cases, 
common regulatory mechanisms could be 
inferred for sets of genes with similar expres- 
sion profiles. For example, seven genes 
showed a late induction profile, with mRNA 
levels increasing by more than ninefold at 



the last timepoint but less than threefold at 
the preceding timepoint (Fig. 5B). -All of 
these genes were known to be glucose-re- 
pressed, and five of the seven were previously 
noted to share a common upstream activat- 
ing sequence (UAS), the carbon source re- 
sponse element (CSRE) (16-20). A search 
in the promoter regions of the remaining two 
genes, ACRl and 1DP2, revealed that 
ACRl, a gene essential for ACSl activity, 
also possessed a consensus CSRE motif, but 
interestingly, IDP2 did not. A search of the 
entire yeast genome sequence for the con- 
sensus CSRE motif revealed only four addi- 
tional candidate genes, none of which 
showed a similar induction. 
^ Examples from additional groups of 
genes that shared expression profiles are 
illustrated in Fig. 5, C through F. The 
sequences upstream of the named genes in 
Fig. 5C all contain stress response ele- 
ments (STRE). and with, the exception 
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Fig. 1. Yeast genome microarray. The actual size of the microarray is 18 mm by 18 mm. The 
microarray was printed as described (9). This image was obtained with the same fluorescent 
scanning confocal microscope used to collect alt the data we report {49). A fluorescently labeled 
cDNA probe was prepared from mRNA isolated from cells harvested shortly after inoculation (culture 
density of <5 x 10® cells/ml and media glucose level of 19 g/liter) by reverse transcription in the 
presence of Cy3-dUTP. Similarly, a second probe was prepared from mRNA isolated from cells taken 
from the same culture 9.5 hours later (culture density of ~2 x 10® cells/ml. with a glucose level of 
<0.2 g/titer) by reverse transcription in the presence of Cy5-dUTP. In this image, hybridization of the 
Cy3-dUTP-labeled cDNA (that is, mRNA expression at the initial timepoint) is represented as a green 
signal, and hybridization of Cy5-dUTP-labeled cDNA (that is, mRNA expression at 9.5 hours) is 
represented as a red signal. Thus, genes induced or repressed after the diauxic shift appear in this 
image as red and green spots, respectively. Genes expressed at roughly equal levels before and after 
the diauxic shift appear in this image as yellow spots. 
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of HSP42, have previously been shown to 
be controlled at least in part by these 
elements (21-24). Inspection of the se- 
quences upstream of HSP42 and the two 
uncharacterized genes shown in Fig. 5C, 
YKL026c, a hypothetical protein with 
similarity to glutathione peroxidase, and 
YGR043c, a putative transaldolase, re- 
vealed that each of these genes also pos- 
sess repeated upstream copies of the stress- 
responsive CCCCT motif. Of the 13 ad- 
ditional genes in the yeast genome that 
shared this expression profile (including 
HSP30, ALD2, OM45, and 10 uncharac- 
terized ORFs (25)1, nine contained one or 
more recognizable STRE sites in their up- 
stream regions. 

The heteroirimeric transcriptional acti- 
vator complex HAP2i3A has been shown 
to be responsible for induction of several 
genes important for respiration (26-28). 
This complex binds a degenerate consensus 
sequence known as the CCAAT box (26). 
Computer analysis, using the consensus se- 
quence TNRYTGGB (29), has suggested 
that a large number of genes involved in 
respiration may be specific targets of 
HAP2,3A (30). Indeed, a putative 
HAP2,3 ,4 binding site could be found in 
the sequences upstream of each of the seven 
cytochrome c-related genes that showed 
the greatest magnitude of induction (Fig. 
5 D). Of 12 additional cytochrome c-related' 
genes that were induced, HAP2,3,4 binding 
sites were present in all but one. Signifi- 
cantly, we found that transcription of 
HAP4 itself was induced nearly ninefold 
concomitant with the diauxic shift. 

Control of ribosomal protein biogenesis 
is mainly exerted at the transcriptional 
level, through the presence of a common 
upstream-activating element (UAS^pg) 
that is recognized by the Rap I DN A-bind- 
ing protein (31, 32), The expression pro- 
files of seven ribosomal proteins are shown 
in Fig. 5F. A search of the sequences 
upstream of all seven genes revealed con- 
sensus Rapl-binding motifs (33). It has 
been suggested that declining Rapl levels 
in the cell during starvation may be re- 
sponsible for the decline in ribosomal pro- 
tein gene expression (34). Indeed, we ob- 
served that the abundance of RAP! 
mRNA diminished by 4.4-fold, at about 
the time of glucose exhaustion. 

Of the 149 genes that encode known or 
putative transcription factors, only two, 
HAP4 and SIP4, were induced by a factor of 
more than threefold at the diauxic shift. 
SIP4 encodes a DNA-binding transcrip- 
tional activator that has been shown to 
interact with Snfl, the "master regulator" of 
glucose repression (35). The eightfold in- 
duction of S1P4 upon depletion of glucose 
strongly suggests a role in the induction of 



downstream genes at the diauxic shift. 

Although most of the transcriptional 
responses that we observed were not pre- 
viously known, the responses of many 
genes during the diauxic shift have been 
described. Comparison of the results we 
obtained by DNA microarray hybridiza- 
tion with previously reported results there- 
fore provided a strong test of the sensitiv- 
ity and accuracy of this approach. The 
expression patterns we observed for previ- 
ously characterized genes showed almost 
perfect concordance with previously pub- 
lished results (36). Moreover, the differ- 
ential expression measurements obtained 
by DNA microarray hybridization were re- 
producible in duplicate experiments. For 
example, the remarkable changes in gene 
expression between cells harvested imme- 
diately after inoculation and immediately 
after the diauxic shift (the first and sixth 
intervals in this time series) were mea- 
sured in duplicate, independent DNA mi- 
croarray hybridizations. The correlation 
coefficient for two complete sets of expres- 
sion ratio measurements was 0.87, and for 
more than 95% of the genes, the expres- 



sion ratios measured in these duplicate 
experiments differed by less than a factor 
of 2. However, in a few cases, there were 
discrepancies between our results and pre- 
vious results, pointing to technical limita- 
tions that will need to be addressed as 
DNA microarray technology advances 
(37, 38). Despite the noted exceptions, 
the high concordance between the results 
we obtained in these experiments and 
those of previous studies provides confi- 
dence in the reliability and thoroughness 
of the survey. 

The changes in gene expression during 
this diauxic shift are complex and involve 
integration of many kinds of information 
about the nutritional and metabolic state 
of the cell. The large number of genes 
whose expression is altered and the diver- 
sity of temporal expression profiles ob- 
served in this experiment highlight the 
challenge of understanding the underlying 
regulatory mechanisms. One approach to 
defining the contributions of individual 
regulatory genes to a complex program of 
this kind is to use DNA microarrays to 
identify genes whose expression is affected 



Fig. 2. The section of the ar- 
ray indicated by the gray box 
In Rg. 1 1s shown for each of 
the experiments described 
here. Representative genes 
are labeled. In each of the ar- 
rays used to analyze gene 
expression during the diauxic 
shift, red spots represent 
genes that were induced rel- 
ative to the initial timepoint, 
and green spots represent 
genes that were repressed 
relative to the initial timepoint. 
In the arrays used to analyze 
the effects of the fup7 A mu- 
tation and YAP1 overexpres- 
sion, red spots represent 
genes whose expression was 
increased, ar>d green spots 
represent genes whose ex- 
pression was deaeased by 
the genetic modification. Note 
that distinct sets of genes are 
induced and repressed in the 
different experiments. The 
complete images of each of 
these arrays can be viewed on 
the Intemet (13). Cell density 
as measured by optical densi- 
ty (OD) at 600 nm was used to 
measure the growth of the 
culture. 
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by mutations in each putative regulatory 
gene. As a test of this strategy, we analyzed 
the genomewide changes in gene expression 
that result from deletion of &ie TUP I gene. 
Transcriptional repression of many genes by 
glucose requires the DNA-binding repressor 



Migl and is mediated by recruiting the tran- 
scriptional co-repressors Tupl and Cyc8/ 
Ssn6 (39). Tupl has also been implicated in 
repression of oxygen-regulated, mating-type- 
specific, and DNA-damage-inducible genes 
(40). 




Glucose 



Pentose Phosphate 
Pathway, RNA, DMA, 
Proteins 




Fig. 3. Metabolic reprogramming inferred from global analysis of changes in gene expression. Only key 
metabolic intermediates are identified. The yeast genes encoding the enzymes that catalyze each step 
in this metabolic circuit are identified by name in the boxes. The genes encoding succinyl-CoA synthase 
and glycogen-debranching enzyme have not been explicitly identified, but the ORFs YGR244 and 
YPR184 show significant homology to known succinyl-CoA synthase and glycogen-debranching en- 
zymes, respectively, and are therefore included in the corresponding steps in this figure. Red boxes with 
white lettering identify genes whose expression increases in the diauxic shift. Green boxes with dark 
green lettering identify genes wfx)se expression diminishes in the diauxic shift. The magnitude of 
induction or repression is indicated for these genes. For muttimeric enzyme complexes, such as 
succinate dehydrogenase, the indicated fold-induction represents an unweighted average of all the 
genes listed in the box. Black and white boxes indicate no significant differential expression (less than 
twofold). The direction of the anrows connecting reversible enzymatic steps indicate the direction of the 
flow of metabolic intermediates, inferred from the gene expression pattern, after the diauxic shift. Arrovre 
representing steps catalyzed by genes whose expression was strongly induced are highlighted in red. 
The broad gray arrows represent major increases in the flow of metabolites after the diauxic shift, 
inferred from the indicated changes in gene expression. 



Wild-type yeast cells and cells bearing 
a deletion of the TUPl gene (tupj A) were 
grown in parallel cultures in rich medium 
containing glucose as the carbon source. 
Messenger RNA was isolated from expo- 
nentially growing cells from the two pop- 
ulations and used to prepare cDNA la- 
beled with Cy3 (green) and Cy5 (red), 
respectively {II), The labeled probes were 
mixed and simultaneously hybridized to 
the microarray. Red spots on the microar- 
ray therefore represented genes whose 
transcription was induced in the tupJA 
strain, and thus presumably repressed by 
Tupl (4 J )- A representative section of the 
microarray (Fig. 2, bottom middle panel) 
illustrates that the genes whose expression 
was affected by the tup] A mutation, were, 
in general, distinct from those induced 
upon glucose exhaustion (complete images 
of all the arrays shown in Fig. 2 are avail- 
able on the Internet (J 3)), Nevertheless, 
34 (10%) of the genes that were induced 
by a factor of at least 2 after the diauxic 
shift were similarly induced by deletion of 
TUP I , suggesting that these genes may be 
subject to TUPl -mediated repression by 
glucose. For example, SUC2, the gene en- 
coding invertase, and all five hexose trans- 
porter genes that were induced during the 
course of the diauxic shift were similarly 
induced, in duplicate experiments, by the 
deletion of TUP]. 

The set of genes affected by Tupl in this 
experiment also included a-glucosidases, 
the mating- type-specific genes MFAl and 
MFA2, and the DNA damage-inducible 
RNR2 and RhJR4, as well as genes involved 
in flocculation and many genes of unknown 
function. The hybridization signal corre- 
sponding to expression of TUPJ itself was 
also severely reduced because of the (in- 
complete) deletion of the transcription unit 
in the tup] A strain, providing a positive 
control in the experiment (42). 

Many of the transcriptional targets of 
Tupl fell into sets of genes with related 
biochemical functions. For instance, al- 
though only about 3% of all yeast genes 
appeared to be TUPI -repressed by a factor 
of more than 2 in duplicate experiments 
under these conditions, 6 of the 13 genes 
that have been implicated in flocculation 
{15) showed a reproducible increase in 
expression of at least twofold when TUPJ 
was deleted. Another group of related 
genes that appeared to be subject to TUPJ 
repression encodes the serine-rich cell 
wall mannoproteins, such as Tipl and 
Tirl/Srpl which are induced by cold 
shock and other stresses (43), and similar, 
serine-poor proteins, the seripauperins 
(44). Messenger RNA levels for 23 of the 
26 genes in this group were reproducibly 
elevated by at least 2.5-fold in the tupJA 
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strain, and 18 of these genes were induced 
by more than sevenfold when TV? I was 
deleted. In contrast, none of 83 genes that 
could be classified as putative regulators of 
the cell division cycle were induced more 
than twofold by deletion of T\J?\ , Thus, 
despite the diversity of the regulatory sys- 
tems that employ Tupl, most of the genes 
that it regulates under these conditions 
fall into a limited number of distinct func- 
tional classes. 

Because the microarray allows us to 
monitor expression of nearly every gene in 
yeast, we can, in principle, use this ap- 
proach to identify all the transcriptional 
targets of a regulatory protein like Tupl. It 
is important to note, however, that in any 
single experiment of this kind we can only 
recognize those target genes that are nor- 
mally repressed (or induced) under the 
conditions of the experiment. For in- 
stance, the experiment described here an- 
alyzed a MAT a strain in which MFAl 
and MFA2, the genes encoding the a- 
factor mating pheromone precursor, are 
normally repressed. In the isogenic tup] A 
strain, these genes were inappropriately 
expressed, reflecting the role that Tupl 
plays in their repression. Had we instead 
carried out this experiment with a MATA 
strain (in which expression of MFAJ and 
MFA2 is not repressed), it would not have 
been possible to conclude anything re- 
garding the role of Tupl in the repression 
of these genes. Conversely, we cannot dis- 
tinguish indirect effects of the chronic 
absence of Tupl in the mutant strain from 
effects directly attributable to its partici- 
pation in repressing the transcription of a 
gene. 

Another simple route to modulating the 
activity of a regulatory factor is to overex- 
press the gene that encodes it. YAP J en- 
codes a DNA-binding transcription factor 
belonging to the b-zip class of DNA-bind- 
ing proteins. Overexpression of YAP I in 
yeast confers increased resistance to hydro- 
gen peroxide, o-phenanthroline, heavy 
metals, and osmotic stress (45). We ana- 
lyzed differential gene expression between a 
wild-type strain bearing a control plasmid 
and a strain with a plasmid expressing YAP I 
under the control of the strong GALl-lO 
promoter, both grown in galactose (that is, 
a condition chat induces YAP J overexpres- 
sion). Complementary DNA from the con- 
trol and YAP J overexpressing strains, la- 
beled with Cy3 and Cy5, respectively, was 
prepared from mRNA isolated from the two 
strains and hybridized to the microarray. 
Thus, red spots on the array represent genes 
that were induced in the strain overexpress- 
ing YAPi. 

Of the 17 genes whose mRNA levels 
increased by more than threefold when 



YAP J was overexpressed in this way, five 
bear homology to aryl-alcohol oxidoreduc- 
tases (Fig. 2 and Table 1). An additional 
four of the genes in this set also belong to 
the general class of dehydrogenases/oxi- 
doreductases. . Very little is known about 
the role of aryl-alcohol oxidoreductases in 
S. cerevisiae, but these enzymes have been 
isolated from ligninolytic fungi, in which 
they participate in coupled redox reac- 
tions, oxidizing aromatic, and aliphatic 
unsaturated alcohols to aldehydes with the 
production of hydrogen peroxide (46, 47). 
The fact that a remarkable fraction of the 
targets identified in this experiment be- 
long to the same small, functional group of 
oxidoreductases suggests that these genes 

Fig. 4, Coordinated reg- 
ulation of functionally re- 
lated genes. The curves 
represent the average In- 
duction or repression ra- 
tios for all the genes in 
each indicated group. 
The total numt>er of 
genes in each group was 
as foltows: ribosomal 
proteins, 112; translation 
elongation and Initiation 

factors, 25; tRNA synthetases (excluding mitochondtal synthetases), 1 7; glycogen and trehalose syn- 
thesis and degradation, 15; cytochrome c oxidase and reductase proteins,' 19; and TCA- and gtyoxy- 
late-cycle enzymes, 24. 

Table 1 . Genes induced by YAPI overexpression. This list includes all the genes for \which mRNA levels 
increased by more than twofold upon YAP1 overexpression in both of two duplicate experiments, and 
for which the average increase in mRNA level in the two experiments was greater than threefold (50). 
Positions of the canonical Yapl binding sites upstream of the start codon, v^en, present, and the 
average fold-increase in mRNA levels measured in the two experiments are indicated. 



might play an important protective role 
during oxidative stress. Transcription of a 
small number of genes was reduced in the 
strain overexpressing Yapl. Interestingly, 
many of these genes encode sugar per- 
meases or enzymes involved in inositol 
metabolism. 

We searched for Yapl-binding sites 
(TTACTAA or TGACTAA) in the se- 
quences upstream of the target genes we 
identified (48). About two-thirds of the 
genes that were induced by more than 
threefold upon Yapl overexpression had 
one or more binding sites within 600 bases 
upstream of the start codon (Table 1 ), sug- 
gesting that they are directly regulated by 
Yapl. The absence of canonical Yapl-bind- 
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ing sites upstream of the others may reflect 
an ability of Yapl to bind sites that differ 
from the canonical binding sites, perhaps in 
cooperation with other factors, or less like- 
ly, may represent an indirect effect of Yapl 
overexpression, mediated by one or more 
intermediary factors. Yapl sites were found 
only four times in the corresponding region 
of an arbitrary set of 30 genes that were not 
differentially regulated by Yapl. 

Use of a DNA microarray to character- 
ize the transcriptional consequences of 
mutations affecting the activity of regula- 
tory molecules provides a simple and pow- 
erful approach to dissection and character- 
ization of regulatory pathways and net- 



works. This strategy also has an important 
practical application in drug screening. 
Mutations in specific genes encoding can- 
didate dmg targets can serve as surrogates 
for the ideal chemical inhibitor or modu- 
lator of their activity. DNA microarrays 
can be used to define the resulting signa- 
ture pattern of alterations in gene expres- 
sion, and then subsequently used in an 
assay to screen for compounds that repro- 
duce the desired signature pattern. 

DNA microarrays provide a simple and 
economical way to explore gene expres- 
sion patterns on a genomic scale. The 
hurdles to extending this approach to any 
other organism are minor. The equipment 



required for fabricating and using DNA 
microarrays (9) consists of components 
that were chosen for their modest cost and 
simplicity. It was feasible for a small group 
to accomplish the amplification of more 
than 6000 genes in about 4 months and, 
once the amplified gene sequences were in 
hand, only 2 days were required to print a 
set of 110 microarrays of 6400 elements 
each. Probe preparation, hybridization, 
and fluorescent imaging are also simple 
procedures. Even conceptually simple ex- 
periments, as we described here, can yield 
vast amounts of information. The value of 
the information from each experiment of 
this kind will progressively increase as 
more is learned about the functions of 
each gene and as additional experiments 
define the global changes in gene expres- 
sion in diverse other natural processes and 
genetic perturbations. Perhaps the greatest 
challenge now is to develop efficient 
methods for organizing, distributing, inter- 
preting, and extracting insights from the 
large volumes of data these experiments 
will provide. 
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Fig. 6, Distinct temporal patterns of induction or repression help to group genes that share regulatory 
properties. (A) Temporal profile of the ceil density, as measured by OD at 600 nm and glucose 
concentration in the media. (B) Seven genes exhibited a strong induction (greater than ninefokJ) only at 
the last timepoint {20.5 hours). With the exception of IDP2, each of these genes has a CSRE UAS. TTiere 
were no additbnal genes observed to match this profile. (C) Seven members of a class of genes marked 
by early induction with a peak in mRNA levels at 18.5 hours. Each of these genes contain STRE motif 
repeats in their upstream promoter regions. (D) Cytochrome c oxdase and ubiquinol cytochrome c 
reductase genes. Mart<ed by an inductton coincident with the diauxic shift, each of these genes contains 
a consensus binding motif for the HAP2.3,4 protein complex. At least 1 7 genes shared a similar 
expression profile. (E) SAM 7, GPPI, and several genes of unknown function are repressed before the 
diauxic shift, and continue to be repressed upon entry into stationary phase. (F) RitxDSomal protein 
genes comprise a large class of genes that are repressed upon depletion of glucose. Each of the genes 
profiled here contains one or more RAPI -binding motifs upstream of its pronnoter. RAP1 is a transcrip- 
tbnal regulator of most ribosomal proteins. 
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tion. the bound DNA was denatured by a 2-mtn in- 
cubation in distined water at -95»C. The slides were 
then transferred into a bath of 100%'ethanol at room 
temperature, rinsed, and then spun dry in a clinical 
centrifuge. Slides were stored in a closed box at 
room temperature until used. 

10 YPD medium (8 liters), in a 10-iiter femnentation 
vessel, was inoculated with 2 ml of a fresh over- 
night culture of yeast strain DBY7286 (MATa, ura3, 
GAl_2). The fermentor was maintained at SO'C with 
constant agitation and aeration. The glucose con- 
tent of the media was measured with a UV test kit 
(Boehringer Mannheim, catalog number 716251) 
Ceil density was measured by OD at 600- nm wave- 
length. Aliquots of culture were rapidly withdrawn 

■ from the fermentation vessel by peristaltic pump, 
spun down at room temperature, and then flash 
frozen with liquid nitrogen. Frozen cells were stored 

at-80«C. ^ ^ 

11 Cy3-dUTPorCy5-dLrTP(AmBrsham)wa3incorpo- 
' rated during reverse transcription of 1 .25 \x.q of 

potyadenylated lpoly(A)*l RNA, pnmed by a dT(1 Q 
oligomer. This mixture was heated to 70 C for 10 
min and then transferred to Ice. A premixed solu- 
tion consisting of 200 U Superscript II (Gibco), 
buff er, deoxyribonucleoside triphosphates, and flu- 
orescent nucleotides, was added to the RNA Nu- 
cleotides were used at these final concentrations: 
500 M-M for dATP, dCTP. arKt dGTP and 200 tiM 
for dTTP. Cy3-dUTP and CyS-dUTP were used at 
%K a final concentration of 100 |juM. The reaction was 

then incubated at 42*C for 2 hours. Unincorporat- 
ed fluorescent nucleotides were removed by first 
^\ diluting the reactbn mixture with of 470 of 10 
^VmM tris-HQ (pH 8.0)/1 mM EDTA and then subse- 
quently concentrating the mix to -5 jJ. using Cen- 
tricon-30 microconcentrators (Anntcon). 

1 2. Purified, labeled cDNA was resuspended in 1 1 »iJ of 
3,5x SSC containing 10 t^g poIy(<lA) and 0.3 nl of 
10% SDS. Before hytxidization, the solution was 
boiled for 2 min and then allowed to cool to room 
temperature. The solution was applied to the mi- 
croan-ay under a cover slip, and the slide was 
placed in a custom hybridization chamber which 
was subsequently Incubated for -8 to 12 hours In 

^ a water bath at 62°C. Before scanning, slides were 

^ ^ washed in 2x SSC. 0.2% SDS for 5 min. and then 

0.05X SSC for 1 min:, Slides were dried before 
scanning by centrifugation'at 500 rpm in a Beck- 
man CS-6R centrifuge. 

1 3. The complete data set is available on the Internet at 
cmgm.stanford.edu/pbrown/explore/index.html 
For 95% of an the genes analyzed, the mRNA levels 
measured in cells harvested at the first and second 
intend after inoculation differed by a factor of less 
than 1 5. The correlation coefficient for the compar- 
ison between mRNA levels measured for each gene 
in these two different mRNA samples was 0.98. 
When duplicate mRNA preparations from the same 
cell sample were compared in the same way. the 
con-elation coefficient between the expression levels 
measured for the two samples by conparative hy- 
bridization was 0.99. 

15. The numbers and identities of known and putative 
genes, and their homologies to other genes, were 
gathered from the following public databases: Sac- 
charomyces Genome Database (genome-www. 
stanford.edu). Yeast Protein Database (quest7. 
proteome.com), and Munich Information Centre for 
Protein Sequences (speedy.mips.biochem.mpg.de/ 
mips/yeast/tndex.htmlx). „. . . 
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36 For example, we obsen/ed large inductions of the 
genes coding for PCKh FBP1 (Z. Yin ef al., Md, 
Microbid, 20. 751 (1996)1, the central glyoxylate 
cycle gene \CU (A. Scholer and H. J. Schuller, 
Curr. Genet 23. 375 (1993)1, and the "aerobic" 
isoform of acetyl-CoA synthase, ACS1 (M. A. van 
den Berg ef a/. . J. Bid. Chem. 271 . 28953 (1 996)1. 
with concomitant down-regulatton of the glycolyt- 
ic-specific genes PVKJ and PFK2 [P. A. Moore ef 
a/.. Moi CeU. &d 11. 5330 (1991)1. Other genes 
not directly involved in cartoon metabolism but 
known to be induced upon nutrient limitation in- 
clude genes encoding cytosolic catatase T CTT7 
IP. H. Bissinger ef a/., ibid. 9. 1309 (1989)] and 
several genes encoding small heat-shock proteins, 
such as HSP12, HSP26, and HSP42 [I. Fart<as ef 
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2717(1996)1. 

37. The levels of induction we measured for genes that 
were expressed at very low levels in the uninduced 
state (notably. FBP1 and PCK1) were generally lower 
than those previously reported. This discrepancy 
was likely due to the conservative background sub- 
traction method we used, which generally resutted in 
overestimation of very low expression levels (46). 

38. Cross-hytjridization of highly related sequences can 
also occasionally obscure changes in gene expres- 
sion, an important concern where members of gene 
families are functionally specialized and differentially 
regulated. The major alcohol dehydrogenase genes, 
ADH1 and AIDH2, share 88% nucleotide Identity. 
Reciprocal regulatfon of these genes is an important 
feature of the diauxic shift, but was not obsen/ed in 
this experiment, presumably because of cross-hy- 
bridization of the fluorescent cDNAs representing 
these two genes. Nevertheless, we were able to de- 
tect differential expression of closely related isoforms 
of other enzymes, such as HXK1/HXK2 (77% iden- 
tical) [P. Herreroe/a/.. Yeasf 11 . 137 (1995)1, M^S7/ 
DAL7 (73% identicaO (20), and PGMUPGM2 (72% 
identical) (D. Oh. J. E. Hopper. Md. Cell. Bid. 10. 
1 4 1 5 (1 990)1. in accord with previous studies. Use in 
the microanray of deliberately selected DNA se- 
quences coresponding to the most divergent seg- 
ments of homologous genes, in Heu of the complete 
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dance between the sets of genes that appeared to 
be induced was very high between the two experi- 
ments. When only the 355 genes that showed al 
least a twofold increase in mRNA in the tupl A strain 
in either of the duplicate experinr>ents were com- 
pared, the con-elation coefficient was 0.82. 

42. The tupl A mutation consists of an insertion of the 
LEU2 coding sequence, including a stop codon. be- 
tween the ATG of TUP1 and an Eco R I site 1 24 base 
pairs before the stop codon of the TUP1 gene. 

43. L R. Kowalski. K. Kondo. M. tnouye, Md. Microbid. 
15,341 (1995). 

44. M. Viswanathan. G. Muthukumar, Y. S. Corg, J. 
Lenard, Gen© 148, 149 (1994). 
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242,250(1994). 
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Microarrays were scanned using a custom-built 
scanning laser microscope txjtt by S. Smith with 
software written by N. Ziv. Details concerning scan- 
ner design and construction are available at crr^gm. 
stanford.edu/pbrown. Images were scanned at a 
resolution of 20 jun per pixel. A separate scan, using 
the appropriate excitation Hne, was done for each of 
the two fluorophores used. During the scanning pro- 
cess, the ratio between the signals in the two chan- 
nels was calculated for several an-ay elements con- 
taining total genomic DNA To normalize the two 
channels with respect to overall intensity, we then 
adjusted photomultipBer and laser power settings 
such that the signal ratio at these elements was as 
dose to 1 .0 as possible. The combined images were 
analyzed writh custom-written software. A bounding 
box. fitted to the size of the DNA spots in each 
quadrant, was placed over each an-ay element. The 
average fluorescent intensity was calculated by sum- 
ming the intensities of each pixel present in a bound- 
ing box, and then dividing by the total number of 
pixels. Local sd'ea background was calculated for 
each array element by determining the average fluo- 
rescent intensity for the tower 20% of pixel intensi- 
ties. /Mthough this rnethod tends to underestimate 
the background, causir>g an underestimatioh.of ex- 
ireme ratios. It produces a very consistent and noise- 
tolerant approximation. /Although the analog-to- 
digrtal board used for data collection possesses a 
wide dynamk; range (12 bits), several signals were 
saturated (greater than the maximum signal intensity 
allowed) at the chosen settings. Therefore, extreme 
ratios at bright elements are gerwrally underestimat- 
ed. A signal was deemed significant if the average 
intensity after background subtraction was at least 
2.5-fold higher than the standard deviation in the 
backgrcund measurements for all elements on the 
array. 

In addition to the 17 genes shown in Table 1 , three 
additional genes were irvduced by an average of 
more than threefold in the duplicate experiments, but 
in one of the two experiments, the induction was less 
than twofold (range 1 .6- to 1 .9-fold) 
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